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Chapter  1 

Introduction  to  the  Distributed  Wafer  Scale 
Radio 


The  goal  of  this  research  project  was  to  determine  the  feasibility  and  likely  performance  for  a  wafer 
scale  distributed  radio.  The  vision  for  such  a  radio  includes  a  monolithic  wafer  consisting  of  a 
repetitive  pattern  of  building  blocks,  as  shown  in  Fig.  1.1,  where  each  block  consists  of  a  wideband 
transceiver  and  several  antenna  elements  covering  a  wide  frequency  range.  By  integrating  the 
entire  functionality  of  the  radio  onto  a  single  wafer,  several  advantages  ensue. 

Furthermore,  the  aim  was  to  quantify  the  technical  challenges  and  ultimate  performance  of 
a  wafer  scale  distributed  microwave/mm-wave  radio  implemented  in  deeply  scaled  silicon  tech¬ 
nology  (CMOS  or  SiGe  BiCMOS).  The  envisioned  radio  operates  over  several  broad  frequency 
bands  spanning  10-90  GHz  with  multiple  spatial  beams,  in  essence  exploiting  frequency,  time, 
and  spatial  degrees  of  freedom.  Even  though  the  underlying  electronics  can  be  implemented  in 
nanoscale  CMOS  or  SiGe  -  which  is  known  to  suffer  from  lower  power  handling,  higher  noise, 
and  lower  reliability  than  other  semiconductor  technologies  -  the  distributed  radios  performance 
could  far  exceed  todays  systems  in  terms  of  size,  robustness,  cost,  and  power.  This  improvement 
in  performance  (despite  the  inferior  devices)  is  obtained  by  utilizing  many  hundreds  to  thousands 
of  radiation  elements  for  signal  processing,  beam  forming,  redundancy,  and  a  combination  of  near 
and  far-field  spatial  power  combining. 


1.1  Motivation 

If  such  a  wafer  scale  distributed  radio  can  be  realized,  several  existing  applications  (such  as 
radar  and  communication  links)  would  immediately  benefit,  while  many  new  applications  would 
emerge.  Given  the  raw  signal  processing  power  of  an  entire  silicon  wafer,  we  envision  the  wafer 
scale  radio  modulating/de-modulating  tens  to  hundreds  of  different  channels  simultaneously.  The 
additional  spatial  resolution  brought  by  the  use  of  higher  mm-wave  frequencies  also  opens  the 
possibility  of  realizing  a  high  resolution  mm-wave  imaging  system  for  applications  such  as  imag- 
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Figure  1.1:  A  wafer  scale  distributed  radio. 


ing  in  low  visibility  conditions  (fog),  hidden  weapon  detection,  and  medical  applications  such  as 
tumor  detection  and  stroke-type  identification.  Today  these  systems  are  not  in  widespread  use 
due  to  several  limitations,  including  cost,  but  more  importantly  the  difficulty  in  manufacturing 
and  assembling  the  various  components  reliably,  and  the  resulting  bulky  size  due  to  the  modular 
approach  of  traditional  MMIC  packaging  of  sub-components. 

A  single  monolithic  wafer  obviates  the  assembly  process  and  presents  a  compact  and  lightweight 
solution.  The  wideband  capability  of  such  a  device  could  also  be  used  to  realize  an  ultrawideband 
(UWB)  ground-penetrating  radar  or  portable  “X-ray”  (based  on  mm-wave  radiation)  for  on-the- 
field  detection  of  broken  bones  and  other  injuries.  Higher  frequencies  offer  higher  spatial  reso¬ 
lution  but  suffer  higher  attenuation  when  traveling  through  tissue,  walls,  or  the  ground.  A  wafer 
scale  system  employing  true  time  delay  elements  can  be  used  for  spatial  power  combining  of 
a  wideband  signal,  offering  the  potential  for  very  high  effective  radiation  power  in  the  desired 
spatial  direction.  The  powerful  digital  signal  processing  offered  by  nanoscale  silicon  enables  ex¬ 
tremely  intelligent  arrays  to  be  realized  on-wafer,  naturally  exploiting  the  distributed  computing 
offered  by  silicon  without  the  requirement  to  route  several  high  data  rate  digital  signals  off-chip 
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as  with  traditional  systems.  Given  that  the  power  capability  of  semiconductor  technology  drops 
dramatically  with  frequency,  it  is  fortunate  that  more  radiators  can  be  realized  in  an  array  due  to 
the  decreasing  wavelength.  For  instance,  each  reticle  on  the  wafer  scale  radio  may  contain  only 
4-6  elements  at  10  GHz  (spaced  at  a  fraction  of  a  wavelength).  However,  using  the  same  spacing, 
hundreds  of  elements  could  be  integrated  into  a  reticle  at  90  GHz.  In  this  way,  the  output  power 
and  sensitivity  of  the  wafer  as  a  whole  would  be  relatively  constant  with  frequency. 


1.2  Technical  Rationale 

Modem  silicon  technology  offers  ultrafast  transistors,  with  fj  >  200  GHz  in  todays  45nm  CMOS 
and  fj  >  300  GHz  in  SiGe.  While  extremely  fast,  these  transistors  suffer  from  several  limitations 
which  affect  the  performance  of  high  dynamic  range  analog  and  RF  circuits.  Principally,  the 
low  supply  voltage  hampers  the  dynamic  range,  and,  combined  with  the  low  intrinsic  gain,  high 
variability  of  nanoscale  transistors,  and  increasing  1//  noise  due  to  high-K  materials,  it  becomes 
clear  that  analog  operations  are  increasingly  difficult  to  realize  in  silicon.  In  RF  applications,  the 
modest  noise  performance  in  the  microwave  and  mm-wave  band  (operating  close  to  fj)  has  limited 
the  application  of  silicon  technology  to  short  range  wireless  systems.  Long  range  communication 
and  in  particular  military  communication  systems  require  extremely  wide  dynamic  range  front- 
ends  and  radios,  robust  performance  in  the  presence  of  strong  interfering  and  jamming  signals, 
high  output  power,  and  wideband  operation.  Most  of  these  specifications  cannot  be  met  with 
traditional  silicon  technology  above  10  GHz.  In  this  project  we  explore  the  potential  for  a  new  kind 
of  wafer  scale  distributed  radio  that  can  overcome  the  limitations  of  silicon  by  exploiting  its  high 
levels  of  integration  and  a  more  intimate  relationship  between  the  electronics  and  electromagnetic 
structures.  We  plan  to  build  on  the  success  of  the  TEAM  project,  where  researchers  demonstrated 
highly  integrated  CMOS  and  SiGe  radio  transceivers  operating  in  the  60  GHz  band,  and  circuit 
building  blocks  operating  beyond  100  GHz,  close  to  the  limits  of  activity  for  silicon  where  realized. 
The  limitations  of  conventional  radios  and  the  solutions  offered  by  the  wafer  scale  distributed  radio 
are  summarized  below. 

1.2.1  Wafer  Scale  Distributed  Radio 

The  proposed  system  is  comprised  of  approximately  100  25mmx25mm  reticles  on  a  12”  wafer. 
Each  reticle  forms  a  distributed  radio  over  several  broad  frequency  ranges,  e.g.  from  10-30  GHz, 
30-60  GHz,  and  60-90  GHz.  In  turn,  each  reticle  contains  several  sub-element  transceivers  operat¬ 
ing  over  these  same  frequency  ranges.  Each  sub-element  transceiver  is  comprised  of  distributed  ac¬ 
tive  near-field  and  far-field  antennas  and  couplers,  a  broadband  front-end  radio  with  a  synthesized 
true  time-delay  element,  and  a  bank  of  high  performance  baseband  signal  processing  elements. 

An  important  question  is  concerning  the  practical  realizability  of  a  very  large  phased  array. 
The  performance  of  an  extremely  large  phased-array  is  limited  by  mismatches  in  the  gain  and 
phase  of  each  sub-element.  In  Chapter  2  we  explore  these  limitations  in  detail,  and  in  particular 
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Figure  1.2:  Each  reticle  consists  of  a  pattern  of  broadband  antennas  covering  lower  to  higher 
frequencies  with  increasing  density. 


use  statistical  analysis  to  show  that  for  a  very  large  array,  the  effect  of  inaccuracies  can  be  tolerated 
if  the  errors  can  be  kept  as  random  as  possible.  The  antenna  pattern  on  the  silicon  has  a  layout 
density  commensurate  with  the  frequency  of  operation  as  shown  in  Fig.  1.2.  Since  Si  on-chip 
antennas  are  inefficient  and  consume  large  area  at  the  lower  edge  of  the  band,  a  major  challenge  is 
the  efficient  realization  of  antenna  elements.  The  key  issue  is  to  avoid  lossy  radiation  modes  in  the 
silicon  substrate.  One  option  which  was  briefly  explored  is  the  use  of  a  highly  resistive  substrate  in 
the  silicon  processing,  such  as  SOI  technology.  Alternatively,  the  antenna  can  be  grown  on  top  of 
the  wafer  or  using  a  daughter  wafer  (Fig.  1.3)  with  additional  low  temperature  masking  steps  and 
electrically  shielded  from  the  substrate  with  a  thick  dielectric  layer.  The  post-processing  option  is 
low  cost  and  mass  producible  due  to  the  relaxed  lithography  ( lOjUm)  and  offers  much  lower  losses 
due  to  the  spatial  isolation  from  the  substrate.  This  fabrication  can  be  performed  with  conventional 
technology,  e.g.  “redistribution  layers”  used  in  a  flip-chip  process  technology.  Flip-chip  bumps 
can  be  used  for  power  and  ground  whereas  all  other  signals  travel  off  the  chip  using  the  integrated 
antennas.  However  interconnects  from  one  wafer  to  another  are  lossy,  especially  in  the  higher 
frequencies  such  as  94  GHz,  and  it  is  preferable  to  realize  as  much  functionality  in  a  single  wafer 
as  possible. 

In  this  project  the  on-chip  antenna  structures  were  carefully  designed  and  simulated  using  full- 
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Figure  1.3:  A  wafer  scale  distributed  radio  composed  of  two  wafers.  A  low-cost  antenna  wafer  is 
bonded  to  an  advanced  silicon  wafer  containing  the  electronics. 


wave  electromagnetic  simulation.  Simulations  of  arrays  of  antennas  is  key  in  order  to  determine 
practical  limits  of  integration,  such  as  the  minimum  spacing  between  antennas,  achievable  antenna 
gain  and  isolation  (both  isolation  in  distance  and  at  different  frequencies),  side  lobe  radiation 
levels,  the  impact  of  unintentional  radiation  on  other  blocks,  and  spatial  nulls  achievable  to  cancel 
out  interference  signals.  These  simulations  were  carried  out  and  summarized  in  Chapter  4. 

Beam  forming  is  a  key  ingredient  in  the  function  of  the  proposed  system.  There  are  several 
different  techniques  possible  to  introduce  the  needed  phase  shift  in  the  signal  path,  such  as  RF 
phase  shifters  or  time-delay  elements,  LO  phase  shift,  or  baseband  phase  shifting.  The  baseband 
phase  shifter  is  the  most  flexible  approach  but  requires  precise  timing  and  synchronization.  RF 
phase  shifters  are  typically  narrowband,  introduce  attenuation  unless  amplification  is  used,  or 
require  active  variable  gain  elements  in  an  I/Q  combiner  structure.  Active  elements  such  as  the 
VGA  generally  limit  the  dynamic  range  of  the  system.  In  Chapter  3,  we  describe  a  new  phase 
shifter  structure  which  realizes  a  synthesized  transmission  line  with  variable  delay  by  varying  the 
capacitance  of  the  line  through  MOS  varactors  and  the  inductance  of  the  line  through  the  action 
of  the  reflected  inductance  from  the  secondary  windings  of  a  transformer  [3].  The  advantage  of 
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the  proposed  structure  is  the  ability  to  process  wideband  signals.  An  early  prototype  phase  shifter 
shows  a  good  impedance  match  and  performs  up  to  11  GHz  in  90nm  technology.  The  frequency 
limit  is  set  by  the  cut-off  frequency  of  the  synthesized  transmission  line,  which  improves  in  direct 
proportion  to  the  varactor  capacitance,  which  means  that  technology  scaling  can  lead  to  a  great 
improvement  in  the  frequency  performance  of  such  a  structure. 

Broadband  electronic  building  blocks  are  needed  for  amplification  and  frequency  conversion 
of  the  RF  energy.  Traveling  wave  amplifiers  (TWA)  have  good  output  power  due  to  the  power 
combining  of  multiple  devices,  good  linearity,  and  reasonable  noise  performance.  We  describe 
new  techniques  for  obtaining  higher  efficiency  from  distributed  power  amplifiers  in  Chapter  3.  A 
prototype  has  a  simulated  bandwidth  exceeding  100  GHz  in  SiGe  technology  with  high  output 
power  and  efficiency.  Due  to  the  extremely  broadband  operation,  these  blocks  are  generic  and  can 
be  used  for  an  extremely  wideband  front-end,  operational  from  10-90  GHz. 

The  wafer  scale  radio  requires  the  cooperation  of  several  hundred  elements  to  realize  a  high 
power  transmitter  and  a  high  dynamic  range  receiver.  Self-configuration  and  testing  is  enabled  by 
cooperation,  eliminating  the  expensive  and  time-consuming  testing  associated  with  the  fabrication 
of  mm-wave  systems.  Dynamic  range  is  improved  by  interference  cancellation  through  dynamic 
beam  forming.  The  noise  figure  and  output  power  are  improved  by  the  phased  array  spatial  power 
combining  and  high  directivity.  For  dense  urban  applications,  the  sensitivity  of  the  wafer  scale  is 
improved  due  to  exploitation  of  rich  spatial  diversity  in  the  received  signal.  None  of  this  is  possible 
unless  the  system  as  a  whole  can  dynamically  select,  adjust,  and  program  the  micro  radiators  and 
receivers  to  the  appropriate  phase,  frequency,  and  power  level.  In  effect,  the  reticles  form  a  high 
bandwidth  sensor  network  which  must  be  programmed  in  a  dynamic  fashion.  Due  to  variability, 
the  performance  of  each  reticle  will  be  different  and  some  reticles  may  be  non-operational.  We  in¬ 
vestigated  the  architecture  of  a  high-speed,  low-latency  “back  plane”  wired/wireless  infrastructure 
to  carry  data  from  the  master  reticle(s)  to  the  slave  reticles.  Each  packet  of  data  is  time  stamped 
with  delivery  addresses  so  that  a  cohort  of  radiating  elements  can  coordinate  with  each  other  to 
form  a  beam  at  the  intended  frequency.  Given  that  several  beams  can  be  formed  by  such  a  radiating 
element  carrying  different  information,  the  bandwidth  of  the  back-plane  (which  would  effectively 
need  to  span  the  entire  wafer)  will  need  to  be  rather  large.  We  investigated  the  synchronization 
limits  imposed  by  technology  in  the  realization  of  the  wafer  scale  distributed  radio  to  determine 
the  highest  data  rates  and  the  maximum  number  of  possible  beams  that  can  be  practically  formed. 
These  issues  are  addressed  in  Chapter  5. 


1.3  Executive  Summary 

This  research  project  has  focused  on  four  primary  areas  related  to  the  wafer  scale  distributed  radio. 
First  and  foremost  we  investigated  the  possibility  of  exploiting  a  very  large  array  for  beam  forming 
and  beam  nulling  (spatial  filtering).  A  careful  analysis  of  the  antenna  element  gain  and  phase 
mismatch  shows  that  a  very  large  array,  when  designed  correctly,  is  actually  insensitive  to  the 
random  errors  in  the  components.  This  is  an  important  result  which  contradicts  some  of  the  well 
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known  relations  between  element  accuracy  and  beam  pattern.  This  is  very  encouraging,  though, 
because  in  a  wafer  scale  radio  we  can  easily  exploit  a  great  number  of  transceivers  and  antenna 
elements  but  it  is  much  harder  to  control  the  exact  phase  and  amplitude  of  signals  distributed 
across  the  face  of  an  entire  wafer. 

Second,  we  analyzed  the  performance  limitations  for  integrated  antenna  elements.  To  date  all 
measurements  on  integrated  silicon  antennas  exhibit  radiation  efficiencies  below  10%,  which  is 
prohibitively  low.  In  our  work  we  have  found  that  it  is  possible  to  realize  antenna  efficiencies 
as  high  as  70%  through  wafer  thinning,  and  entire  arrays  can  be  realized  with  efficiency  of  40%. 
These  results  are  critical,  especially  for  higher  frequencies,  since  routing  signals  off  chip,  or  even 
to  a  daughter  high  resistivity  wafer,  at  94  GHz  for  instance,  is  extremely  lossy.  When  one  includes 
the  losses  of  a  pad,  ESD,  and  bonding  and  interconnect,  the  on-chip  antenna  solution  is  very 
compelling. 

Third,  we  demonstrated  that  it  is  possible  to  build  extremely  wideband  building  blocks  using 
silicon  technology.  These  building  blocks  could  form  the  front-end  of  a  broadband  transceiver 
or  be  utilized  in  an  ultra-wideband  system  (such  as  a  radar  imager).  We  focused  on  the  most 
difficult  block,  the  power  amplifier.  We  show  that  using  a  new  distributed  amplifier  topology, 
where  the  devices  and  transmission  line  impedance  are  tapered,  it  is  possible  to  realize  bandwidths 
over  100  GHz  with  good  gain  and  high  output  power  (17  dBm)  and  high  efficiency  (25%).  We 
also  demonstrate  new  architectures  for  phase  shifters  which  are  compact  and  can  provide  true 
time  delay,  which  is  critical  in  wideband  communication  systems.  The  new  phase  shifter  can  be 
embedded  into  a  amplifier  to  realize  a  Variable  Delay  Amplifier  (VDA). 

Finally,  we  studied  the  power  and  area  requirements  in  order  to  properly  synchronize  an  entire 
wafer.  Synchronization  is  required  since  we  desire  collaboration  between  transceivers  operating 
across  a  wafer.  In  order  to  beam  form  or  to  perform  space-time  coding,  the  signals  applied  to 
the  antennas  must  be  time  synchronized.  Phase  synchronization  is  also  required  in  a  phased-array 
for  beam  forming/nulling.  Even  though  systematic  offsets  can  be  tolerated,  any  drift  or  unknown 
phase  error  can  lead  to  degraded  performance.  We  find  that  by  carefully  designing  the  clock-tree 
as  a  network  of  standing  wave  oscillators,  and  by  utilizing  injection  locking,  a  clock  tree  can  be 
realized  with  small  skew  (lps)  while  dissipating  only  170W  on  the  wafer. 
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Chapter  2 

Phased  Array  System  Design 
Considerations 


This  chapter  analyzes  the  effect  of  errors  in  antenna  weights  on  the  performance  of  adaptive  array 
systems,  both  in  the  case  when  an  array  is  used  to  maximize  the  gain  in  a  desired  direction  and 
in  the  case  when  an  array  is  used  to  null  interfering  signals.  We  begin  by  deriving  an  explicit 
characterization  of  the  loss  in  array  gain  due  to  phase  errors  in  the  optimal  antenna  weights.  Then, 
we  examine  interference  rejection  in  the  presence  of  amplitude  and  phase  errors  in  the  antenna 
weights.  We  prove  that  the  loss  in  interference  rejection  is  independent  of  the  number  of  antennas. 
For  both  cases,  we  give  numerical  simulations  that  validate  our  analysis. 

2.1  Impact  of  phase  and  amplitude  errors  on  array  perfor¬ 
mance 

Many  modern  communication  systems  employ  adaptive  antennas  in  order  to  improve  their  capac¬ 
ity,  coverage,  and  reliability.  Unlike  conventional  fixed  antenna  systems,  adaptive  antenna  arrays 
dynamically  adjust  their  beam  patterns  in  response  to  their  environment.  Adaptive  arrays  can  ex¬ 
tend  the  range  by  focusing  most  of  the  radio  frequency  (RF)  power  on  a  desired  target.  This  is 
known  as  beamforming  or  beam- steering.  Adaptive  arrays  can  also  reject  unwanted  interference 
signals  by  placing  nulls  in  the  direction  of  the  interferers,  which  is  known  as  beam-nulling  or 
null-steering.  Even  if  the  directions  of  the  interferers  are  unknown,  adaptive  arrays  can  still  reduce 
signal  propagation  in  undesired  directions  by  using  side  lobe  suppression. 

Adaptive  arrays  are  composed  of  multiple  antenna  elements  that  can  be  arranged  in  different 
geometries  (antennas  are  usually  spaced  at  least  half  a  wavelength  apart)  [10].  Larger  arrays 
provide  more  gain  and  degrees  of  freedom.  The  beam  pattern  (the  locations  of  peaks  and  nulls, 
and  the  heights  of  the  side  lobes)  is  shaped  by  controlling  the  amplitudes  and  phases  of  the  RF 
signals  transmitted  and  received  from  each  antenna  element.  For  this  reason,  adaptive  arrays  are 
often  referred  to  as  phased  arrays.  Precise  control  over  both  amplitudes  and  phases  is  required 
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to  achieve  good  performance.  However,  various  factors  such  as  finite  resolution,  noise,  mismatch 
in  circuit  elements,  and  channel  uncertainty  limit  the  precision  that  can  be  achieved  in  practice. 
Many  of  these  error  sources  are  random,  and  cannot  be  compensated  for  using  pre-calibration 
or  adaptive  signal  processing  techniques.  The  limited  precision  will  degrade  the  performance  of 
the  array  (gain  and  interference  rejection).  In  this  chapter,  we  examine  the  impact  of  phase  and 
amplitude  errors  on  the  array  gain  (beamforming)  in  Section  2.2  and  on  interference  rejection 
(beam-nulling)  in  Section  2.3.  We  provide  both  mathematical  proofs  and  simulation  results  that 
characterize  the  array  performance  as  a  function  of  phase  and  amplitude  errors. 


2.2  Beamforming 


Complex  Baseband  Channel  Representation 


Figure  2. 1 :  A  communication  system  with  an  adaptive  array  at  the  receiver.  The  narrowband  signal 
s1  [«]  arrives  at  each  antenna  shifted  in  phase  by  i The  receiver  applies  a  phase  shift  of  0,  at  each 
antenna  and  sums  the  signals. 

Consider  the  array  of  N  elements  shown  in  Figure  2.1.  A  signal  s[n],  sent  by  a  remote  trans¬ 
mitter,  arrives  at  each  antenna  i  in  the  array  shifted  in  phase  by  y/, 1 .  Antenna  i  will  then  apply  a 

1  In  general,  signals  arrive  at  different  antenna  elements  with  different  delays.  However,  for  narrowband  signals, 
time  delays  can  be  approximated  with  phase  shifts  [41].  We  define  signals  as  narrowband  when  the  fractional  band¬ 
width  (the  ratio  between  the  signal  bandwidth  and  carrier  frequency)  is  very  small  (e.g.  less  than  1%).  In  this  chapter, 
we  shall  assume  that  all  signals  are  narrowband. 
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phase-shift  fa  to  the  incoming  signal.  Therefore,  the  overall  complex  (baseband)  channel  response 
H  at  the  output  of  the  receiver  array  is  given  by2: 


N 

H=  X> 

i=  1 


To  maximize  the  magnitude  of  H,  fa  is  chosen  equal  to  y//,  in  which  case  \H\2  reaches  its 
maximum  value  of  \Hopt\2  =  N2.  In  practice,  however,  factors  such  as  quantization,  clock  jitter, 
and  other  sources  of  noise  make  it  virtually  impossible  to  realize  the  desired  phase-shifts.  Most  of 
these  errors  are  unpredictable  and  time  varying,  and  are  best  modeled  with  random  variables: 


fa  —  fa  +  fa 

We  will  assume  that  5/  ~  U[— 8max,  8max],  where  0  <  8max  <  180°  is  an  upper  bound  on  the 
amplitude  of  phase  deviation3.  Furthermore,  we  assume  that  the  errors  are  identically  and  inde¬ 
pendently  distributed  ( i.i.d .)  across  different  antennas.  In  this  case  the  channel  response  becomes: 

Hopt  =  £  ej5‘  =  L  cos(fa)  +  ./X  sin(fa) 

i=  1  i=  1  i=  1 

We  wish  to  characterize  the  effect  of  the  phase  errors  on  the  square  magnitude  of  the  channel 
response.  To  simplify  the  analysis,  we  introduce  two  new  random  variables  Xj  =  cos  (5/)  and 
Yi  =  sin(<5,-)  and  compute  the  following  expectations: 


/dx  =  E[Xi]  =  E[cos(Si)]  = 


1 


28, 


cos  (x)dx  = 


'max  J  &max 

sin(5mav) 


cos  (x)dx 


to!=e[x/2]  = 


1  f^max  2  /  x  -  1 


2  & 


'max  J-8max 
Sr 


cos  z{x)dx  = 


%tax  JO 


cos  2{x)dx 


]  C  Smax 

xo —  /  (l+cos(2x))d. 

^ Omax  JO 

jiy  =  E[Yi\  =  £'[sin(5/)]  =  0  (by  symmetry) 


r X 1  si X\(28max) 

2  4  &mnx. 


w  =  £K2]  = 


1 


2  8„ 


Smax  _  | 

sin2  (x)dx  = 


Jmax  JO 


’max  J  -Smax 

1  f5maxn  1  sin(2  5max) 

Xt—  /  (1  -  cos(2 x))dx  =  - - 

^Omax  JO  £  ^ Omax 


sin2  (x)dx 


2The  channel  response  is  identical  in  the  scenario  with  multiple  transmitters  and  a  single  receive  antenna. 

3 We  assume  a  uniform  distribution  to  simplify  the  calculations.  Note  that  no  assumptions  were  made  regarding 
the  geometry  of  the  array  or  the  direction  of  arrival,  so  the  result  holds  for  an  arbitrary  array. 
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Now,  we  can  rewrite  the  expression  for  the  channel  response  as: 


N  N 

Hopt  =  '£Xi  +  j'LYi 

i=l  i=\ 

(N  \  2  (  N  \2  N  N 

£*<  +  =££(xkxi+YkYi) 

i=  1  )  \i=  1  J  k=ll=l 

^E[\Hopt\2\]  =  ^^(EiX^+Em]) 

k=ll=l 


E[XkX,}  =  | 
E[YkY,]  =  | 


E[Xk]E[Xt]  =  \l\ 
E[Xl\  =  \iX2 

E[Yk]E[Yl]=$ 

E[Y2]=iiY2 


when  k^l  (using independence) 
when  k  =  l 

when  k^l  (using  independence) 
when  k  =  / 


=  (N2-N) 
If  we  normalize  E 


=>  E[\Hopt\2\]  =  (N2  —  N)(Hx  +  fly)  +N(!1x2  +  HY2) 

/W(<Sm«*n  +yv=(yv2}  /sin2(gwm)  \  +N  /  _  sin2(^)  \ 

\  O max  J  \  °max  J  \  ° max  J 

[|H0/,f|2|]  by  dividing  by  the  maximum  value  \Hopt\2  =  N2,  we  obtain: 


Nontax) 


E\\H, 


opt\ 


sin2(<5„ 


N2 


82 

, max 


c)  1  / .  sin  ( 8max ) 

~  +  N  V  S2 


d5 ( 8ma_x )  —  lim  ^N^max) 

N^oo 


sin  ( 8max ) 


82 

'■'max 


Figure  2.2(a)  shows  a  plot  of  <f '(Smax)  for  0  <  8max  <  180°.  Figure  2.2(b)  shows  the  same 
function  in  dB  scale.  Figure  2.2(b)  also  shows  that  the  calculated  array  gain  closely  matches 
simulation  results.  Figure  2.2(c)  shows  that  the  actual  distribution  of  the  phase  errors  has  little 
impact  on  the  loss  in  array  gain.  Notice  that  using  a  single  bit  of  phase  resolution  corresponds  to 
8max  =  90°  =  f ,  and  d>(f )  =  (|)2  w.4«  —3.9 dB\  We  expect  the  bound  to  become  tighter  as  N 
increases,  due  to  the  law  of  large  numbers.  The  graphs  in  Figure  2.3  show  the  loss  in  array  gain  as 
a  result  of  quantizing  the  phase  to  one  and  two  bits.  Figures  2.3(g,h)  show  that  quantization  does 
not  increase  the  width  of  the  main  lobe.  Also,  notice  that  when  8max  =  180°,  which  corresponds 
to  completely  randomizing  the  phase  of  each  antenna,  the  normalized  array  response  <f>jv(7r)  =  jj, 
which  reduces  the  array  gain  to  that  of  an  omni-directional  antenna.  So  a  simple  way  of  creating 
an  omni-like  beampattem  without  reducing  the  radiated  power  is  to  choose  the  phases  randomly4. 


4With  omni-directional  antennas,  the  absolute,  non-normalized  power  of  the  signals  adds. 
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Figure  2.2:  (a)  The  normalized  array  gain  <F(<5,mu:)  as  a  function  of  the  maximum  phase  error  8max. 
(b)  The  normalized  array  gain  in  dB  scale,  101og<f>(5„„a  ).  The  plot  shows  both  the  calculated  gain 
and  the  simulated  gain  for  a  10000  element  array,  (c)  The  simulated  gain  (dB)  for  a  10000  element 
array  with  different  phase  error  distributions.  For  a  uniform  distribution,  the  standard  deviation  is 

&8  = 
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w=. 

(a) 


(b) 


1 00x1 00  array  with  half  wavelength  spacing  1 00x1 00  array  with  quarter  wavelength  spacing 


(e) 


(f) 


Figure  2.3:  (a)-(f)  Array  loss  as  a  result  of  phase  error  for  different  size  arrays,  (g)-(h)  2- 
dimensional  horizontal  beampattern  of  a  100x100  array  with  A/2  spacing  (steered  towards  55 
degrees)  for  different  phase  resolutions. 
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A  second  method  of  proving  a  lower  bound  on  the  array  gain  is  by  using  the  mean  of  the 
random  variable,  which  is  often  easier  to  compute,  instead  of  the  mean  of  the  square  of  the  random 
variable.  By  Jensen’s  inequality,  the  square  of  the  mean  of  a  random  variable  is  less  than  or  equal 
to  the  mean  of  its  square: 


E[X]2  <E[X2] 

for  any  random  variable  X.  More  generally: 

/(E[X])  <  E[f(X)]  when  /(•)  is  a  convex  function. 
Using  this  fact,  and  the  expected  value  of  the  channel  response,  we  see  that: 


E[Hopt]=E 


N 


N 


Y.X-+>LY> 

i=  1  i=  1 


=  Nfix  —  N 


sin  (8max) 


i2i  \  i2 _ \t2  /^sin(5max) 


E[\Hopt\z]>E[Hopt]z=Nz 


max 

2 


Nontax)  — 


E[\Hopt\2\ ]  >  fsin(8max) 


N2 


\  4 


max 

2 


2.3  Beam-nulling 

In  addition  to  maximizing  SNR  by  steering  the  direction  of  the  beam  towards  desired  locations, 
many  communication  systems  are  faced  with  unwanted  interference.  In  most  of  these  systems, 
simply  steering  the  direction  of  the  peak  is  not  sufficient  to  suppress  large  interferes  and  signal 
jammers.  Other  techniques,  such  as  null-steering  and  side  lobe  suppression,  are  required  to  provide 
the  necessary  rejection  of  interfering  signals. 

Adaptive  systems  that  require  precise  control  of  the  locations  of  the  nulls  and  side  lobe  levels 
need  to  adjust  both  the  phase  and  amplitude  response  of  each  antenna  element.  In  this  case,  we 
need  to  account  for  both  phase  and  amplitude  errors.  Analyzing  the  combined  effect  of  phase  and 
amplitude  errors  is  easier  when  we  consider  the  problem  in  the  spatial  domain  where  the  optimal 
complex  beamforming  weights  and  channel  responses  can  be  represented  as  complex  vectors  in  the 
/V-dimensional  Euclidean  space,  where  N  is  the  number  of  antennas  in  the  array.  Let  us  assume  that 
we  have  K  +  1  vectors:  a  desired  vector  corresponding  to  the  direction  of  the  desired  signal5, 
and  K  interfering  vectors  h,  V\<i<K  corresponding  to  the  directions  of  K  interfering  signals. 

Ki  =  [a\dejli'd,. . . ,  aNdejpNd]T 
^  =  [aueJp'‘, ame^]T  Vj  <t<K 

5  We  will  denote  scalars  in  lower  case,  vectors  in  bold  lower  case,  and  matrices  in  bold  upper  case. 
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The  incoming  signal  at  the  input  of  the  array  y[n]  is  the  sum  of  the  desired  signal  and  interference 
and  noise: 

K 

y  [n\  =  h  dd[n]  +  idi[n\+\[n] 

i=  1 

where  d[n]  is  the  desired  signal,  cl,  [n]  is  interfering  signal  i,  and  \[n]  is  the  white  noise  vector  at 
the  receiver  (the  variance  of  each  component  of  v[«]  is  a;2).  For  simplicity,  we  shall  assume  that 
the  desired  and  interfering  signals  have  the  same  power.  Using  beamforming  weights  w  (without 
loss  of  generality,  we  can  restrict  |w|  =  1),  the  signal  at  the  output  of  the  array  will  be  wHy[n].  The 
output  signal  to  noise  plus  interference  ratio  (SINR)  is  given  by: 


SmRout  = 


w"hrf 

2 

|  yK 
\  Li= i 

w"h,| 

2  +  <Tv2 

where  (• )H  denotes  the  complex  conjugate  transpose.  Let  H/  =  [hi, . . . ,h^]  be  the  matrix  whose 
columns  are  the  interference  vectors.  Complete  interference  rejection  can  be  achieved  by  choosing 
a  beamforming  weight  vector  w  that  is  the  projection  of  the  desired  vector  hj  onto  the  subspace 
orthogonal  to  the  column  space  of  H/  (or  the  null-space  of  Hj,  which  is  also  known  as  the  left 
nullspace  of  H/),  as  described  in  [40]: 

W opt  =  W projection  =  k/  H/(Hj  H/)  Hy  h(/ 


We  can  see  that  rejecting  all  the  interfering  signals  is  only  possible  when  the  left  nullspace  is 
non-empty.  This  is  guaranteed  when  K  <  N.  The  projection-based  beamformer  does  not  take 
noise  into  account.  In  general,  maximizing  the  output  SINR  does  not  necessarily  require  complete 
interference  rejection;  reducing  the  interference  to  the  noise  level  may  be  sufficient.  Optimizing 
the  output  SINR  leads  to  the  Minimum  Variance  Distortionless  Response  (MVDR)  beamformer 
[25].  If  we  define  the  noise+interference  correlation  matrix  Rn+i  as: 

K 

R n+i  =  hf  h,  +  Gy  In 

i=  1 


where  In  is  the  N  x  N  identity  matrix,  then  the  output  SINR  can  be  maximized  by  choosing  wopt : 


w0/«  =  WMVDR 


R/v+/hrf 

i*?R 


The  denominator  is  a  normalizing  factor.  When  the  interference  power  is  much  larger  than  the 
noise  power,  both  projection  and  MVDR  yield  virtually  identical  results.  In  practice,  however, 
phase  and  amplitude  errors  degrade  the  performance  of  both  beamformers6.  We  will  assume  that 

6The  errors  can  also  result  from  uncertainties  about  the  channel  responses  for  both  desired  and  interfering  signals. 
Up  to  this  point,  we  have  assumed  perfect  knowledge  of  the  channels. 
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an  optimum  beamformer  wopt  is  computed  using  projection,  and  wopt  takes  into  account  both 
phase  and  amplitude  errors: 

w, opt  =  [alwejl3lw, ...,  aNweJpN"]T 

wopt  =  [aiw(l+£iy'(ft^ 

where  £,•  V|  <i<N  are  i.i.d.  zero  mean  real  random  variables  with  variance  E [£■]  =  a2,  and  <5/  Vi<,<v 
are  i.i.d  zero  mean  real  random  variables  with  variance  E  [5?]  =  cj.  We  also  assume  that  the  phase 
and  amplitude  errors  are  independent  of  each  other.  Furthermore,  we  scale  the  weights  so  that  wopt 
has  unit  norm  (i.e.  Y4L1  afw  =  1). 

The  phase  and  amplitude  errors  result  in  wopt  deviating  from  wopt  by  an  angle  9.  Note  that 
9  does  not  necessarily  correspond  to  a  physical  angle  or  direction.  This  deviation  will  result  in  a 
reduction  in  the  signal  strength  in  the  desired  direction  as  well  as  an  increase  in  the  interference 
power,  since  wopt  will  no  longer  be  orthogonal  the  interference  subspace.  The  desired  power 
is  proportional  to  cos(0),  and  the  increase  in  interference  (leakage)  is  proportional  to  sin(0)  (see 
Figure  2.4).  For  small  angles  9,  we  can  use  the  standard  approximations7  sin(0)  ~  9  and  cos (0)  ~ 
1.  Thus,  we  can  characterize  the  effect  of  phase  and  amplitude  errors  on  beam  nulls  by  considering 
how  the  mean  square  angle  <Jq(o§,<je,N)  =  E[92]  behaves  as  a  function  of  <7§,  ce,  and  N. 

If  we  assume  that  the  phase  and  amplitude  variations  are  small,  and  given  that  wopt  is  unit 
norm,  then  we  can  approximate  the  error  angle  0  with  the  error  vector: 

■w  =  w0pt -Wopt  =  [ aiweJplw(l  -  (1  +£i)eJ§1), . . . ,  aNweJpNw(l  -  (1  +eN)ej5N)]T 

We  can  further  simplify  the  above  expression  using  the  approximations  cos(<5,)  ~  1,  sin(<5,)  ~  <5/, 
and  Sj£i  ~  0. 


■w  =  [aiwe^lw  ( 1  —  1  —  £i  —  j  5i ) , . . . ,  a^we^Nw  ( 1  —  1  —  £n  —  j8, v)] T 
=  [ — «ivve-7^lw  (£i  +y'5i), . . . ,  —0CNWe->PNw  (£n  +  j  §n)\T 

=>  "w|2  =  ("w)"  ("w)  =  L  aiw(£i  +  5 i) 

i=  1 

By  taking  the  expectation  of  this  expression: 


4 


II 

<N  ’ 

iocUef  +  81) 

j=l 

N 


£<£(£[*?] +£[#]) 

i=  1 


E  +  ^2)  =  (Os2  +  o52)  E  al  =  oi  +  crl 

i=  1  i=  1 


7 This  explains  why  nulls  are  more  sensitive  than  peaks  to  phase  and  amplitude  errors,  since  sin(0)  changes  more 
rapidly  than  cos(@)  when  6  is  small. 
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As  we  can  see,  the  mean  square  error  angle  <Jq  is  equal  to  the  sum  of  the  mean  square  phase 
error  aj  and  the  mean  square  amplitude  error  crt2.  The  key  conclusion  that  we  draw  from  this 
result  is  that  the  angle  error  is  independent  of  N,  the  number  of  antennas8. 

The  simulation  results  shown  in  Figure  2.5  verify  this  result.  Figure  2.5(a)  shows  a  linear  rela¬ 
tionship  between  101og(a|  +  cj)  (x-axis)  and  1 0 log (cs^)  (y-axis),  with  slope  equal  to  1.  Figure 
2.5(b)  shows  that  the  relationship  between  101og(cJ)  and  lOlog(cr^),  when  the  amplitude  errors 
e;  are  set  to  0,  is  also  linear  with  slope  equal  to  1,  which  demonstrates  that  the  phase  and  ampli¬ 
tude  errors  contribute  equally  to  the  overall  error  angle  Q.  In  both  Figures  2.5(a)  and  2.5(b),  we 
simulated  a  100  element  array.  We  repeated  the  same  experiment  for  a  1000  element  array  and  the 
results  are  identical,  as  shown  in  Figures  2.5(c)  and  2.5(d).  Figure  2.6  also  shows  identical  results, 
where  we  plot  the  interference  rejection  as  a  function  of  phase  and  amplitude  errors  for  different 
array  sizes.  This  shows  that  the  number  of  antenna  elements  has  no  effect  on  the  mean  square  error 
angle  o |  or  the  interference  rejection.  This  means  that  the  depth  of  beam  nulls  is  limited  by  gain 
and  phase  accuracy,  and  is  independent  of  the  size  of  the  array  N  and  the  number  of  interferers  K, 
as  long  as  N  >  K. 

2.4  Conclusion 

Adaptive  antenna  arrays  are  a  key  component  in  many  modern  communication  systems.  They  are 
used  to  both  increase  the  gain  in  the  direction  of  a  desired  signal  as  well  as  to  reject  interfering  sig¬ 
nals.  However,  when  these  adaptive  arrays  are  implemented,  a  variety  of  practical  considerations 
will  cause  the  actual  antenna  weights  to  differ  from  the  optimal  weights,  which  in  turn  degrades 
the  performance  of  the  array. 

In  this  chapter,  we  analyzed  the  performance  loss  due  to  phase  and  amplitude  errors  in  the 
weights.  We  began  by  considering  a  beamforming  system,  which  maximizes  the  gain  in  a  desired 
direction.  We  derived  an  expression  for  the  loss  in  gain  due  to  uniform  phase  errors,  and  provided 
simulations  that  validate  this  result.  Then,  we  considered  a  beam-nulling  system,  which  rejects 
interfering  signals.  We  analyzed  the  effect  of  uniform  amplitude  and  phase  errors,  and  again 
provide  numerical  simulations.  We  showed  that  the  interference  rejection  is  a  function  of  the 
errors  in  the  weights,  and  is  independent  of  the  number  of  antennas,  assuming  that  there  are  more 
antennas  than  interferers. 


8The  power  leakage  into  the  interference  subspace  is  independent  of  the  number  of  antennas.  However,  increas¬ 
ing  the  number  of  antennas  can  still  increase  the  output  SINR  (peak  to  null  ratio)  by  increasing  the  power  gain  of 
the  desired  signal.  Increasing  the  number  of  antennas  also  increases  the  degrees  freedom  necessary  to  null  more 
interferers. 
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-  Interference/interference  subspace 

-  Desired  signal 

-  Optimum  beamforming  vector 

—  -  Distorted  beamforming  vector 


Figure  2.4:  The  optimum  beamforming  vector  wopt  can  be  viewed  as  a  projection  of  the  desired 
signal  onto  the  subspace  orthogonal  to  the  interference  subspace.  The  distorted  beamforming 
vector  wopt  can  decomposed  into  two  orthogonal  components:  w opt  =  +  wjjpf .  w\pt,  which  is 

parallel  to  wopt,  represents  the  potential  loss  in  beamforming  gain,  and  is  proportional  to  cos(0). 
w opt,  which  is  orthogonal  to  wopt,  represents  the  potential  leakage  into  the  interference  subspace, 
and  is  proportional  to  sin(0). 
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(C) 


(d) 


Figure  2.5:  Simulated  relationship  between  phase  and  amplitude  errors  and  the  mean  square  error 
angle:  (a),(c)  101og(<7g  +  <r|)  on  the  x-axis,  lOlog(00)  on  the  y-axis.  (b),(d)  101og((r|)  on  the 
x-axis,  lOlog(<70)  on  the  y-axis.  In  (a)  and  (b),  the  simulated  array  had  100  elements.  In  (c)  and 
(d),  the  simulated  array  had  1000  elements. 
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mean  square  error  (phase  +  amplitude)  (dB) 


Figure  2.6:  Simulated  power  leakage  (interference  rejection)  as  a  function  of  phase  and  ampli¬ 
tude  errors:  101og(cr|  +  crj )  on  the  x-axis  versus  the  interference  rejection  in  dB  on  y-axis.  In¬ 
terference  rejection  (IR)  is  defined  as  the  ratio  of  the  interferer  power  after  beam-nulling  to  the 
interferer  power  before  beam-nulling.  Before  beam-nulling,  we  choose  the  beamforming  vector 
w before  along  the  direction  of  the  desired  signal  hj  (i.e.  w before  =  j^y)-  In  this  case,  the  input 

power  at  the  receiver  from  interferer  h,  will  be  |w^yorfJh,j2.  After  beam-nulling,  we  choose  the 
beamforming  vector  w after  as  the  projection  of  the  desired  vector  onto  the  subspace  orthogonal 
to  the  interference  subspace.  In  this  case,  the  input  power  at  the  receiver  from  interferer  h,  af¬ 
ter  beam-nulling  (and  phase  and  amplitude  distortion)  will  be  |w^-f  h,j2.  Thus,  on  the  y-axis, 


the  interference  rejection  IR  =  20  log 


w- 


after 


h«| 


Keforehi\ 


after 1 

The  relationship  is  plotted  for  several  array  sizes. 


Both  hj  and  h,  are  complex  random  vectors  whose  components  (both  real  and  imaginary  parts) 
are  sampled  independently  from  a  standard  Gaussian  distribution. 
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Chapter  3 

Circuit  Design  and  Technology 
Considerations 


3.1  Introduction 

Replacing  a  traditional  single  antenna  with  a  phased-antenna  array  results  in  a  tunable  antenna 
pattern.  Suitable  gain  and  phase  vectors  are  applied  to  each  elements  in  order  to  shape  the  array 
pattern.  This  can  be  employed  at  the  transmitter  to  efficiently  transmit  the  power  to  the  desired 
direction  and/or  at  the  receiver  to  spatially  filter  out  unwanted  signals.  The  array  gain  is  propor¬ 
tional  to  the  number  of  elements,  and  output  power  and  receiver  sensitivity  improve  as  the  array 
gets  larger  at  the  expense  of  having  a  narrower  beam.  If  the  whole  wafer  is  dedicated  to  demon¬ 
strate  a  large  phased  array  system  at  mm-wave  frequencies  (for  example  90GHz)  we  will  have: 

/  =  90GHz  =>  A  =  3mm  (3.1) 


Wafer  size  =  300 mm  =$•  N 


tzR^  o 

=  4jt(S/A.)2  ~  25000 


(3.2) 


This  large  number  of  elements  results  in  44dB  of  array  gain  (antenna  directivity).  There  are 
25,000  integrated  radiators  which  means  that  the  effective  radiated  output  power  is  25,000  times 
more  than  the  case  of  a  single  radiator  connected  to  an  antenna  with  44dB  of  directivity.  In  the 
other  words,  the  total  gain  will  be  88dB  with  respect  to  a  single  element  connected  to  an  omni¬ 
directional  antenna.  Assuming  each  radiator  is  capable  of  transmitting  lOmW  of  output  power 
[15]  the  Effective  Isotropic  Radiated  Power  (EIRP)  will  come  to: 


EIRP  =  PtxNG~  6 MW  (3.3) 

In  reality  on-chip  antennas  have  a  10%  efficiency  which  reduces  the  effective  output  radiated 
power  to  600KW.  This  enormous  power  level  opens  up  new  applications  and  opportunities  for  the 
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silicon  technology.  Estimating  the  efficiency  of  each  transceiver  to  be  10%,  a  dc  power  of  about 

PDC  =  10-  lOmW  -25000  =  2.5KW  (3.4) 

should  be  supplied.  This  raises  additional  concerns  about  the  feasibility  and  reliability  of  providing 
the  DC  power  to  the  chip  and  removing  the  dissipated  heat  off  from  it.  Fortunately  advances  in  the 
TSV  (Thru-Silicon- Via)  techniques  help.  This  technology  allows  metalization  on  the  back  side  of 
the  wafer  as  another  means  to  access  the  active  devices  and  leaves  the  front  side  of  the  wafer  to  be 
devoted  to  radiating  elements.  As  a  result  of  the  high  number  of  radiating  elements,  the  beam  is 
ultra  narrow  and  spanned  over  a  sub-degree  angle  which  makes  the  system  more  sensitive  but  also 
greatly  enhances  the  resolution  for  identifying  objects. 


3.2  Phased  Array  Architectures 

Phase  shifting  can  be  done  in  either  digital  signal  processing  domain  (Fig.  3.1),  LO  frequency 
and  clock  distribution  network  (Fig.  3.2)  or  RF  path  (Fig.  3.3),  with  advantages  and  disadvan¬ 
tages  associated  with  each  method  in  terms  of  die  area,  power  consumption,  dynamic  range  and 
programmability. 


Figure  3.1:  A  digital  phase  shifting  architecture. 

Providing  the  phase  shift  in  the  digital  domain  makes  the  system  highly  flexible  and  exploits 
high-speed  DSPs  to  run  complex  high  resolutions  algorithms  in  digital  domain  (Fig.  3.1).  How- 
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ever  in  a  digital  beamformer,  all  building  blocks  are  replicated  for  each  path,  making  the  system 
demanding  in  terms  of  area  and  power  consumption.  Moreover  since  the  spatial  filtering  is  done 
in  the  DSP,  all  the  building  blocks  prior  to  that  should  have  enough  dynamic  range  to  cope  with 
the  large  blocker  levels. 


Figure  3.2:  An  LO  phase  shifting  architecture. 

Phase  shifting  in  the  LO  and  clock  distribution  network  (Fig.  3.2)  reduces  the  number  of  ADCs 
and  baseband  circuitry  and  relaxes  the  dynamic  range  requirements.  The  spatial  filtering  after  the 
signal  combining  attenuates  interfering  signals,  which  in  general  have  a  different  angle  of  arrival, 
but  moves  the  bottleneck  of  the  system  to  the  LO  distribution  network,  which  should  be  highly 
symmetric  to  provide  the  exact  desired  phase  shift.  The  LO  distribution  network  also  burns  a  lot 
of  power  in  buffer  stages  to  provide  a  strong  enough  LO  signal  to  each  of  the  many  mixers  in  the 
array. 

RF  phase  shifting  offers  the  simplest  system  (Fig.  3.3)  in  terms  of  lowest  component  count 
and  power  consumption,  at  the  expense  of  designing  challenging  RF  phase  shifting  elements  and 
including  their  nonidealites  (loss,  noise,  nonlinearity)  directly  in  the  RF  path.  In  the  proposed 
wafer  scale  radio,  the  number  of  elements  is  extremely  large,  and  therefore  to  reduce  the  com¬ 
plexity  and  component  count  of  the  system,  RF  phase  shifting  is  the  best  candidate.  As  presented 
in  following  sections,  novel  architectures  along  with  advances  in  the  performance  of  silicon  tech¬ 
nology  promises  the  realization  of  adding  phase  shifting  elements  directly  to  the  RF  path  of  the 
signal.  Although  providing  the  phase  shift  at  RF  is  less  flexible  than  the  digital  beamformer  in 
terms  of  programmability,  nonetheless  simple  signal  processing  such  as  windowing  and  spatial 
spectral  filtering  can  be  done  by  manipulating  the  gain  of  each  path  on  top  of  adjusting  its  phase, 
hence  having  a  building  block  that  alters  both  the  gain  and  phase  is  highly  desirable. 

As  depicted  in  Fig.  3.4  a  triangular  window  function  decreases  the  gain  of  the  main  lobe  by 
6dB  and  broadens  the  beamwidth,  but  it  greatly  reduces  the  sidelobe  levels  and  minimizes  the 
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Figure  3.3:  An  RF  phase  shifting  architecture. 

effect  of  signals  coming  from  unwanted  directions.  By  using  more  complicated  gain  and  phase 
vectors,  null  steering  can  be  done  to  cope  with  very  strong  jammers. 

Having  signals  from  all  25,000  elements  being  phase  shifted  and  combined  in  the  RF  domain 
raises  severe  issues  about  the  loss  of  the  signal  combiner/divider  and  the  poor  overall  flexibility 
of  the  system.  Therefore  the  final  solution  will  be  a  combination  of  digital  and  RF  phase  shift¬ 
ing  where  the  total  area  is  divided  into  reticles  of  sub-arrays.  In  each  subarray,  beamforming  is 
achieved  by  RF  phase  shifting  and  then  the  sub-arrays  are  connected  and  synchronized  via  the 
relatively  lower  clock  frequency  and  will  be  programmed  digitally.  Beams  of  each  sub-array  can 
be  unique  or  can  be  combined  with  the  beam(s)  of  neighboring  arrays  to  create  a  stronger  beam. 
Multiple  target  tracking  at  different  directions  and  different  frequency  bands  will  be  controlled  by 
the  high-speed  DSPs  that  program  the  sub-arrays. 

3.3  True  Time  Delay  Elements  Versus  Phase  Shifters 

As  shown  in  the  Fig.  3.5,  the  wavefront  reaches  each  antenna  element  of  a  phased  array  with  a  dely 
shift  of  ^  sin0,  and  in  order  to  steer  the  angle  of  look  towards  the  direction  of  incoming  signal, 
this  delay  should  be  compensated  before  combining  signals  from  each  path.  Delays  correspond  to 
linear  phase  shift  in  the  frequency  domain.  A  linear  phase  shift  can  be  approximated  as  constant 
phase  shift  over  a  narrow  bandwidth. 

A (p  =  coAt  ~  co0At  =  k0dsin6,d  =  A/2  =4>  A(j p  =  7rsin0  (3.5) 

As  the  bandwidth  increases,  the  delay-phase  approximation  fails  to  be  accurate.  For  small  arrays 
(4-8  elements)  delay-phase  approximation  works  up  to  20%  of  fractional  bandwidth,  but  as  the 
number  of  array  elements  get  larger,  the  aperture  size  becomes  larger  and  the  array  will  be  more 
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Angle  (deg) 


Figure  3.4:  Windowing  can  improve  the  array  factor  of  a  phased-array  by  reducing  the  side-lobe 
levels. 


directive.  For  a  linear  /V-element  array 


D=  (N—  l)A/2  =>  Beamwidth  =  X/D 


2 


N-  1 


(3.6) 


Which  shows  that  the  beam  gets  narrower  as  N  increases.  Having  constant  phase  shift  corresponds 
to  variable  group  delay  over  the  band,  which  correspondingly  results  in  different  direction  of  look 
over  the  bandwidth 


On 


.  ,cq±Aco  .  „  . 

arcsin  - sin  0o 

(D0 


(3.7) 


For  large  arrays  with  extremely  small  beamwidth,  the  array  pattern  overlaps  for  beams  operating 
at  the  band  edges  can  be  minimal  as  depicted  in  Fig.  3.6.  Due  to  the  nonlinear  function  of  A(p  = 
nsinO,  the  situation  worsens  as  the  direction  of  main  beam  moves  from  broadside  towards  end- 
fire.  All  above  observations  show  that  for  a  wafer-size  array  true  time  delay  elements  are  needed 
instead  of  phase  shifters. 


3.4  True  Time  Delay  Elements 

Traditionally  delay  elements  are  implemented  via  switched  transmission  line  networks.  These 
switched  passive  structures  have  a  high  bandwidth  and  low  insertion  loss,  but  they  also  occupy 
a  large  footprint.  In  a  wafer-scale  radio  it  is  desired  that  the  size  of  the  array  be  dominated  by 
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Figure  3.5:  The  wavefront  of  a  plane  wave  impinges  at  an  angle  to  a  phased-array.  All  signals  can 
be  summed  in  phase  if  a  uniform  time  delay  is  applied  to  the  array  elements. 

antenna  elements  and  the  electronics  be  small  enough  to  fit  in  the  inter-antenna  spacing.  As  the 
electronic  circuitry  becomes  larger  and  comparable  to  the  antenna  size,  fewer  number  of  antennas 
will  fit  on  the  wafer  and  the  array  gain  will  be  reduced. 

To  decrease  the  size  of  delay  elements,  synthesized  (LC)  transmission  lines  have  been  used  in 
the  literature.  By  this  technique  the  size  of  the  array  will  be  greatly  reduced.  To  tune  the  delay  of 
a  section  of  the  artificial  transmission  line  To  =  VLC,  the  capacitance  of  the  line  is  modified  by 
the  varactor  loading.  However  as  capacitance  variation  also  changes  the  charactristic  impedance 

of  the  line  Za  =  and  the  delay  variation  is  limited  (usually  up  to  20%)  in  order  not  to  violate 
matching  requirements. 

In  our  work  we  developed  a  technique  where  both  the  capacitance  and  inductance  of  the  line 
are  adjustable.  As  a  result,  the  Za  of  the  line  will  be  kept  relatively  constant  as  the  delay  of 
each  section  is  greatly  varied.  The  inductance  of  a  single  loop  is  a  function  of  its  geometry  and 
the  permeability  of  surrounding  materials  that  are  both  fixed  after  the  IC  fabrication  and  will  not 
provide  means  for  the  inductance  tuning.  As  inductance  of  a  loop  is  defined  by  total  magnetic 
flux  passing  through  the  loop  divided  by  the  loop  current  (L  =  %),  another  nearby  loop  current 
can  be  manipulated  to  alter  the  flux  in  the  desired  main  loop.  Therefore  a  transformer  can  be  used 
where  its  secondary  current  is  a  multiplicative  copy  of  the  primary  current  and  the  total  effective 
inductance  seen  at  the  primary  is  calculated  by 

12  =  n  '  1 1  =^>  Lef  fective  =  T 1  T  fl  •  M  (3.8) 
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Figure  3.6:  Simulated  beamwidth  for  a  10  element  and  100  element  array  at  the  center  of  the  band 
(where  the  phase  shift  is  ideal)  versus  10%  away  from  the  center.  A  large  array  shows  a  high 
sensitivity  as  a  function  of  frequency. 

For  a  1 : 1  transformer 


L\  —  L,2  —  L  =£■  Ad  —  k  •  L  ,  Ineffective  —  ( 1  n  •  k^L  (3.9) 

Fig.  3.7  demonstrates  a  transformer  embedded  in  a  switching  network  that  provides  the  capability 
of  reversing  the  current  direction  in  the  secondary  (n  =  ±1).  Therefore  the  effective  inductance 
seen  by  each  loop  can  be  either  L\ow  =  (1  —  k)L  or  =  (1  +  k)L.  If  the  capacitance  ratio 

matches  this  inductance  ratio,  =  I±4  then  the  delay  variation  will  be  5^  =  . 

^min  l^min  A  &  1 low  1  ^ 
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Figure  3.7:  The  proposed  switched- transformer  delay  element. 

For  k  =  0.5,  which  is  easily  achievable  in  integrated  transformers  on  silicon,  a  delay  ratio 
of  =  3  will  be  achieved  (which  is  much  larger  than  the  20%  traditionally  obtained)  while 
maintaining  matching  requirements  over  the  bandwidth  Zf,igh  =  Z[ow. 
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Series  voltage  switching  mandates  placing  a  MOS  switch  in  series  with  the  transformer  that 
adds  to  the  loss  of  the  network.  To  make  the  loss  negligible,  switches  become  prohibitively  large 
and  their  parasitic  capacitances  limit  the  bandwidth.  Simulation  verified  by  measurement  results 
show  that  the  series  switching  network  is  functional  up  to  10GHz  in  90nm  CMOS  (Fig.  3.8). 


—  LL  mode 

—  LH  mode 

—  HHinode 


Freq  (GHz) 


Figure  3.8:  Measurement  results  for  a  switched  transformer  based  delay  element. 

To  improve  the  operation  frequency,  a  CML-like  current  mode  switching  network  is  adopted 
(Fig.  3.9).  Having  parallel  transistors  on  the  secondary  side  of  the  transformer  to  provide  the 
switching  function  converts  the  structure  to  a  differential  cascode  amplifier  with  a  switchable  ar¬ 
tificial  transmission  line  section  between  the  top  and  bottom  transistors.  Hence  a  variable  delay 
amplifier(VDA)  is  achieved.  As  the  signal  passes  through  this  building  block,  it  gets  amplified  and 
also  a  desired  delay  will  be  applied  to  it. 

Simulation  result  of  such  an  structure  in  0.1 3 pm  IBM  8HP-SiGe  technology  shows  that  a 
cascade  of  3  of  these  amplifiers  provide  24dB  of  gain  in  a  30GHz  bandwidth  around  60GHz  while 
provideing  12ps  of  delay  in  4ps  steps. 
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Acascode  amplifier  with  tunable  transmission  line 


Output  Transformer 


Figure  3.9:  Schematic  of  proposed  Variable  Delay  Amplifier  (VDA)  architecture. 

3.5  Conclusion 

Issues  and  benefits  of  large  phased  array  systems  were  discussed.  Different  schemes  for  a  phase 
shifting  system  were  studied  and  a  combination  of  RF  and  digital  phase  shifting  is  proposed.  To  be 
applicable  to  non-narrowband  and  large  arrays,  true  time  delays  should  be  used  rather  than  phase 
shifting  approximations.  Different  tunable  delay  mechanism  were  explored  and  in  order  not  to 
limit  the  size  of  the  array  by  the  delay  elements,  artificial  transmission  lines  with  tuning  capability 
on  both  the  capacitance  and  inductance  of  the  line  were  studied  which  resulted  into  a  new  variable 
delay  amplifier  architecture  with  a  promising  simulated  performance. 
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3.6  Wideband  Distributed  Power  Amplifier  Design  Based  on 
Device  Size  and  Output  T-Line  Impedance  Tapering 

3.7  Introduction 

Wideband  amplifiers  are  critical  building  blocks  for  future  wafer-scale  radio  front-ends.  Depend¬ 
ing  on  the  application,  bandwidth  of  80GHz  or  even  higher  is  desired.  In  the  world  of  analog 
designs,  amplifier  bandwidth  is  very  limited  due  to  the  existence  of  parasitic  capacitors.  One  pop¬ 
ular  way  of  doing  high  frequency  design  is  to  use  tuned  amplifiers,  where  parasitic  capacitors 
are  resonated  out  by  inductors  at  desired  frequency,  but  these  amplifiers  are  inherently  narrow- 
band,  especially  when  the  Q  of  the  resonance  network  is  large.  On  the  other  hand,  the  cutoff 
frequency  of  an  artificial  transmission  line  can  be  very  high,  often  in  excess  of  100GHz,  as  long 
as  the  distributed  inductance  and  capacitance  are  small  in  each  section.  Therefore  by  absorbing 
the  transistor  parasitic  capacitance  into  artificial  transmission  lines,  distributed  amplifiers  can  be 
built  with  much  higher  bandwidth.  In  recent  years,  a  number  of  silicon  based  DAs  (Distributed 
Amplifiers)  have  been  reported  with  bandwidth  approaching  100GHz  and  decent  power  gain.  Yet 
one  major  problem  of  these  DAs  is  the  low  power  efficiency,  preventing  them  to  be  used  for  power 
amplification.  In  this  project,  circuit  techniques  and  design  considerations  are  investigated  in  order 
to  maximize  the  DA  efficiency.  A  simultaneous  device  size  and  output  T-Line  impedance  tapering 
technique  is  proposed  to  improve  the  efficiency  of  these  extremely  wideband  structures  without 
gain  or  bandwidth  degradation. 


3.8  Distributed  Power  Amplifier  Design 

3.8.1  Concept  of  device  size  and  output  T-Line  impedance  tapering 

In  a  conventional  distributed  amplifier,  the  input  signal  travels  along  the  input  transmission  line 
and  gets  amplified  by  the  gain  cells.  The  amplified  signals  then  add  constructively  when  they 
travel  towards  the  load.  Because  of  this  nature,  one  can  easily  notice  that  the  largest  voltage  swing 
occurs  at  the  last  gain  stage  since  it  is  the  summation  of  voltage  swings  of  all  previous  stages. 
This  means  only  the  last  stage  experiences  maximum  allowed  voltage  swing,  usually  defined  by 
the  supply  voltage,  under  saturation  power  level  while  previous  gain  stages  never  reach  that  level. 
Since  power  is  the  product  of  voltage  and  current,  if  more  voltage  swing  is  utilized  in  the  previous 
stages,  less  current  swing  is  needed  to  produce  the  same  power,  which  in  turn  means  less  bias 
current  is  needed  for  them.  To  achieve  this,  the  output  T-Line  characteristic  impedance  should 
gradually  increase  from  the  load  to  the  termination  resistor  but  the  gain  cell  device  size  as  well 
as  the  cell  bias  current  should  decrease  in  the  same  direction.  The  basic  structure  is  illustrated  in 
Fig.  3.10.  Since  less  current  is  used  to  produce  the  same  output  power  level,  the  overall  efficiency 
can  be  improved.  To  prove  the  concept,  two  types  of  distributed  power  amplifiers  are  designed, 
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Output  T-Line  Impedance 


Figure  3.10:  Tapered  distributed  power  amplifier  structure. 

one  based  on  cascode  gain  cells  while  the  other  based  on  common-emitter  gain  cells.  Both  of 
them  are  designed  and  simulated  in  IBMs  0.13/im  SiGe  BiCMOS  process.  The  following  two 
sub-sections  will  discuss  the  design  procedure  in  detail. 


3.8.2  General  design  guidelines  for  distributed  amplifier  gain  cells 

For  a  distributed  amplifier,  it  is  desirable  to  design  gain  cells  with  small  input  conductance  and 
susceptance.  Small  input  conductance  reduces  the  shunt  loss  of  the  input  transmission  line  while 
small  susceptance  reduces  the  length  of  the  input  transmission  line  which  in  turn  reduces  the  series 
loss.  Unfortunately,  in  this  technology,  both  the  input  conductance  and  susceptance  of  a  bipolar 
transistor  are  very  large.  At  60GHz,  for  instance,  the  input  conductance  is  nearly  25mS  while 
the  equivalent  input  capacitance  is  around  80fF.  This  will  contribute  significant  high  frequency 
loss  on  the  input  transmission  line  which  eventually  limits  the  bandwidth.  One  way  to  deal  with 
this  problem  is  to  use  resistive  emitter  degeneration.  Both  the  conductance  and  susceptance  de¬ 
crease  when  the  degeneration  resistor  value  increases.  In  addition,  adding  degeneration  resistor 
effectively  prevents  the  thermal  run-away  problem  of  bipolar  transistors.  With  the  help  of  emitter 
degeneration,  the  3-dB  bandwidth  of  the  DA  can  reach  80GHz.  To  further  enhance  the  band¬ 
width,  high  frequency  zeros  can  be  added  to  the  transfer  function  and  it  can  be  easily  achieved  by 
adding  degeneration  capacitors  in  parallel  with  degeneration  resistors.  However,  it  is  important  to 
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point  out  that  adding  large  emitter  degeneration  capacitors  results  in  negative  input  impedance  and 
therefore  can  cause  potential  instability.  To  ensure  a  stable  operation,  a  capacitor  value  of  20fF  is 
chosen.  Since  the  capacitance  value  is  relatively  small,  it  can  be  implemented  by  overlapping  two 
bottom  metal  layers  of  the  process. 

3.8.3  Cascode  based  distributed  power  amplifier 

One  benefit  of  using  cascode  gain  cell  is  that  it  has  better  reverse  isolation  than  other  structures.  As 
a  result,  input  and  output  t-lines  can  be  designed  independently.  Since  it  is  a  power  and  efficiency 
oriented  design,  the  bias  voltage  and  current  are  determined  by  the  desired  output  saturation  power 
level  as  well  as  the  optimum  load  impedance.  Without  using  an  impedance  transformation  network 
at  the  output,  which  is  in  nature  a  narrow  band  circuit,  the  optimum  load  impedance  is  25f2,  and 
this  is  because  the  output  current  can  travel  in  both  forward  and  reverse  directions  and  effectively 
the  gain  cell  is  loaded  by  two  50£2  resistors  in  parallel.  Also,  the  number  of  stages  is  optimized 
for  minimum  transmission  line  loss.  Both  input  and  output  transmission  lines  are  implemented  in 
microstrip  structure. 


Figure  3.1 1:  Lumped  element  model  of  a  distributed  amplifier. 

To  facilitate  the  design  of  the  tapered  transmission  line,  lumped  elements  are  used  to  model 
the  distributed  amplifier,  as  shown  in  Fig.  3.11.  Though  it  is  a  first  order  analysis,  it  provides 
sufficient  insight  into  how  to  optimize  the  voltage  and  current  scaling  factor  between  gain  stages. 
Assuming  the  output  transmission  line  inductance  scales  by  a  factor  of  g  from  stage  to  stage  and 
device  size  scales  by  a  factor  of  k,  the  impedance  seen  by  each  stage  can  be  expressed  in  terms  of 
the  load  impedance.  The  power  generated  by  each  stage  is  equal  to  the  current  square  multiplied 
by  the  impedance.  Then  the  relation  between  power  generated  by  one  stage  of  a  tapered  DA  and 
the  power  generated  by  one  stage  of  the  corresponding  uniform  DA  (no  scaling  between  stages) 
can  be  determined.  In  order  to  maintain  the  same  output  power  as  a  uniform  DA,  k3g  this  must 
be  equal  to  one,  which  tells  one  how  to  simultaneously  scale  the  output  impedance  and  the  device 
size. 
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Figure  3.12:  Output  saturation  power  as  a  function  of  tapering  coefficient  k. 

Fig.  3.12  and  Fig.  3.13  plot  the  output  saturation  power  and  peak  drain  efficiency  as  a  function 
of  the  tapering  coefficient  k.  It  can  be  seen  that  when  k  decreases,  which  means  increased  tapering 
between  stages,  the  output  power  remains  relatively  constant  but  the  efficiency  increases  as  a  result 
of  reduced  bias  current.  However,  if  k  goes  too  small,  the  output  power  eventually  starts  to  roll 
off  and  so  does  the  efficiency.  This  is  mainly  because  the  mismatch  between  stages  becomes 
significant  and  the  output  signal  experiences  large  reflections  when  it  travels  along  the  output  T- 
Line.  The  optimum  tapering  coefficient  is  around  0.955  for  an  8-stage  DA.  In  practice,  it  is  hard 
to  achieve  very  high  loaded  T-Line  impedance,  so  the  actual  tapering  coefficient  is  slighter  greater 
the  optimum  value.  The  complete  schematic  is  shown  in  Fig.  3.14. 

3.8.4  Common-emitter  based  distributed  power  amplifier 

In  spite  of  better  reverse  isolation  which  makes  design  easier,  cascode  gain  cells  have  lower  effi¬ 
ciency  since  the  voltage  swing  is  limited  by  the  knee  voltage  of  the  cascode  device.  In  comparison, 
common-emitter  gain  cells  usually  provide  better  efficiency.  However,  common-emitters  are  noto¬ 
rious  for  poor  reverse  isolation,  therefore  circuit  techniques  are  needed  to  overcome  the  problem. 
One  way  to  deal  with  the  poor  stability  is  to  use  cross-coupled  capacitors  which  can  neutralize  the 
differential  pair.  This  also  reduces  the  input  capacitance  since  the  Miller  effect  is  canceled  out  to 
the  first  order.  In  addition  to  that,  the  output  T-Lines  are  implemented  in  coupled  strip  lines  with¬ 
out  ground  plane.  This  provides  much  higher  odd  mode  characteristic  impedance  than  standard 
microstrip  lines  and  much  better  common-mode  rejection  since  the  only  return  current  path  for 
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Figure  3.13:  Peak  drain  efficiency  as  a  function  of  tapering  coefficient  k. 

even  mode  signals  is  the  substrate.  The  scaling  factors  between  stages  are  determined  based  on 
the  same  analysis  presented  in  the  previous  subsection  and  the  optimum  tapering  coefficient  k  is 
0.94  for  a  6-stage  common-emitter  DA.  The  complete  schematic  is  shown  in  Fig.  3.15. 


3.9  Simulated  Performance 

3.9.1  Cascode  based  distributed  power  amplifier  performance 

Fig.  3.16  shows  the  simulated  S  parameters  of  the  cascode  distributed  power  amplifier.  The  small 
signal  gain  is  10.3dB  and  the  3-dB  bandwidth  is  110GHz.  The  amplifier  is  unconditionally  stable 
at  any  frequency.  Fig.  3.17  and  Fig.  3.18  plot  the  simulated  large  signal  performance  of  the  power 
amplifier.  In  order  to  make  comparison,  a  uniform  distributed  power  amplifier  is  also  designed 
using  the  same  technology,  the  performance  of  which  is  plotted  in  dashed  curves.  In  terms  of 
output  saturation  power,  the  tapered  DA  matches  the  uniform  counterpart  reasonably  well  at  any 
given  frequency,  which  proves  that  the  tapering  concept  works  effectively.  The  tapered  DA  has 
relatively  higher  output  PldB  at  high  frequency,  and  it  is  mainly  because  smaller  devices  are  used 
in  the  front  and  less  distortion  is  introduced.  In  terms  of  drain  efficiency,  as  a  result  of  reduced 
total  bias  current,  the  tapered  DA  performs  better  than  the  uniform  DA  at  any  frequency.  The  peak 
drain  efficiency  is  greater  than  10%  up  to  90GHz. 
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Figure  3.14:  Schematic  of  the  tapered  cascode  distributed  power  amplifier. 

3.9.2  Common-emitter  based  distributed  power  amplifier  performance 

Fig.  3.19  shows  the  simulated  S  parameters  of  the  common-emitter  distributed  power  amplifier. 
The  small  signal  gain  is  7.4dB  and  the  3-dB  bandwidth  is  113GHz  and  the  amplifier  is  uncondi¬ 
tionally  stable.  Fig.  3.20  and  Fig.  3.21  show  the  simulated  large  signal  performance.  Again,  a 
uniform  counterpart  is  designed  for  comparison  and  it  can  be  clearly  seen  that  in  term  of  output 
saturation  power,  the  two  DAs  are  very  close.  But  the  peak  efficiency  of  the  tapered  DA  is  better 
than  the  uniform  one  at  any  given  frequency.  The  peak  efficiency  is  greater  than  17%  up  to  90GHz. 
Compared  to  cascode  DAs,  the  efficiency  of  the  common-emitter  DA  is  significantly  enhanced. 


35 


Figure  3.15:  Schematic  of  the  tapered  common-emitter  distributed  power  amplifier. 

3.10  Conclusion 

In  conclusion,  distributed  power  amplifiers  with  bandwidth  in  excess  of  100-GHz  have  been 
demonstrated.  These  types  of  amplifiers  pave  the  road  to  future  integration  of  extremely  wideband 
systems  such  as  wafer-scale  radios.  Systematic  approach  for  power  and  efficiency  optimization 
has  been  analyzed.  In  particular,  a  device  size  and  output  T-Line  impedance  tapering  technique 
has  been  proposed  to  enhance  the  power  efficiency  of  distributed  amplifiers,  overcoming  a  major 
problem  with  traditional  DAs.  The  concept  is  verified  by  simulating  both  cascode  and  common- 
emitter  distributed  power  amplifiers  in  IBMs  SiGe  BiCMOS  process. 
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Figure  3.16:  Simulated  S  parameter  of  the  cascode  distributed  power  amplifier. 


Figure  3.17:  Simulated  output  power  of  cascode  distributed  power  amplifiers  as  a  function  of 
frequency. 
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Figure  3.18:  Simulated  drain  efficiency  of  cascode  distributed  power  amplifiers  as  a  function  of 
frequency. 


Frequency  (GHz) 


Figure  3.19:  Simulated  S  parameter  of  the  common-emitter  distributed  power  amplifier. 
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Figure  3.20:  Simulated  output  power  of  common-emitter  distributed  power  amplifiers  as  a  function 
of  frequency. 


Figure  3.21:  Simulated  drain  efficiency  of  common-emitter  distributed  power  amplifiers  as  a  func¬ 
tion  of  frequency. 
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Chapter  4 

On-Chip  Antenna  and  Phased-Array 
Performance 

4.1  Introduction 

Integrated  antennas  on  silicon  is  a  promising  technology  at  millimeter-wave  frequencies.  The 
size  of  antenna  unit  element  could  be  designed  comparable  to  traditional  bond-wire  pads.  There¬ 
fore  integrated  antenna  could  be  a  cost-effective  solution  compared  to  conventional  packaging. 
Moreover,  packaging  the  antenna  with  transceivers  causes  large  insertion-loss  at  millimeter-wave 
range.  Moreover,  fully  integrated  system  in  a  single  chip  provides  extra  design  flexibilities  by 
co-designing  antenna  with  transceivers  to  achieve  the  broader  space  coverage,  wide  bandwidgh, 
and  better  beam  shaping  characteristics. 

The  main  challenge  of  the  integrated  antennas  on  the  silicon  substrate  is  the  low  radiation  effi¬ 
ciency  which  is  generally  less  than  10%  [38].  There  are  two  main  reasons  for  such  a  low  radiation 
efficiency.  One  is  the  conduction  loss  owing  to  low  resistivity  of  the  silicon  substrate,  and  another 
is  the  surface  wave  mode  excitation  caused  by  the  thick  silicon  substrate  with  a  high  permittivity 
[31].  In  order  to  mitigate  the  effect  of  low  resistivity,  the  high  resistive  (HR)  substrate  has  been 
investigated  [11]  [28],  The  SOI  substrate  is  a  good  example  of  the  HR  substrate  which  makes  it 
possible  to  achieve  a  fully  integrated  transceiver  with  on-chip  antenna  having  relatively  high  radi¬ 
ation  efficiency.  However,  SOI  substrate  is  not  a  cost  effective  choice  for  on-chip  antenna.  One 
interesting  way  to  get  a  HR  substrate  using  a  low  resistive  silicon  substrate  is  the  proton  implan¬ 
tation  method  using  cyclotron  ion  source  [13].  However,  this  method  could  cause  damage  to  the 
semiconductor  active  layers.  Another  approach  to  achieve  a  high  radiation  efficiency  is  to  use 
MEMS  technology.  Using  MEMS,  a  lossy  silicon  substrate  is  substituted  with  a  dielectric  mem¬ 
brane  to  mitigate  both  conductive  loss  and  surface  wave  excitation  [2],  Another  exotic  example 
using  MEMS  technology  is  a  patch  antenna  with  air  substrate  which  achieves  radiation  efficiency 
of  94%  [30].  However,  MEMS  technology  requires  various  extra  processes  other  than  a  conven¬ 
tional  CMOS  technology,  which  prohibits  full  integration  on  chip  with  CMOS  ICs.  Moreover,  the 
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Figure  4.1:  Various  approach  to  improve  radiation  efficiency-HR  substrate  (a-[31],  b-[13]),  MEMS 
technology  (c-[2],  d-[30]),  substrate/superstrate  dielectric  lens  (e-[39],  f-[4]),  Dielectric  resonator 
(g-[12]),  (h)  [38]  is  a  typical  example  of  the  native  antennas  on  a  lossy  Si  substrate-differential 
dipole  antenna  which  has  radiation  efficiency  around  10%. 

structural  robustness  is  usually  sacrificed  with  the  improved  radiation  efficiency. 

In  terms  of  surface  wave  excitation,  the  substrate  dielectric  lens  is  widely  used  in  millimeter- 
wave  applications  where  the  substrate  is  thick  enough  to  cause  multimode  surface  wave  excitation 
[36,  9,  39].  The  dielectric  lens  confines  the  electric  field  which  prohibits  surface  wave  excitation. 
Moreover  the  dielectric  lens  can  improve  the  antenna  gain  by  confining  the  field  to  a  certain  dircec- 
tion.The  main  disadvantage  of  using  a  dielectric  lens  is  large  lens  size  as  well  as  a  process  difficulty 
in  dielectric  lens  fabrication.  In  order  to  overcome  the  fabrication  difficulty  for  the  substrate  lens, 
a  superstrate  dielectric  layers  over  the  on-chip  antenna  has  been  introduced  [4],  Another  type  of 
antenna  which  avoids  the  surface  wave  excitation  is  a  dielectric  resonator  antenna  [12],  For  this 
type  of  antenna,  most  of  the  electric  fields  are  confined  within  a  high  dielectric  resonator  that  the 
conduction  loss  in  a  lossy  silicon  substrate  is  mitigated.  However,  this  type  of  antenna  has  critical 
limitation  in  terms  of  fabrication  which  requires  an  extra-process  and  a  precise  alignment  of  the 
dielectric  resonator. 

Table  4. 1  summarizes  several  different  approaches  to  improve  the  radiation  efficiency  (see  also 
Fig.  4.1).  From  the  table,  we  conclude  that  two  main  factors  for  low  radiation  efficiency  could 
be  mitigated  by  applying  the  wafer-thinning  technology.  This  approach  takes  advantage  of  the 
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Table  4.1:  Reported  design  approaches  for  the  radiation  efficiency  improvement. 


Advantages 

Disadvantages 

Native  Antennas 

-  Low  cost  /  Ease  of  fabrication 

-  Best  for  full  integration 

-  Low  radiation  efficiency 

-  Limited  antenna  gain 

-  Main-lobe  distortion  due  to  substrate 
modes  in  mmW  range 

Modified 

Antennas 

Technology 

HR  Si  substrate 

-  Very  good  for  full  integration 

-  Moderate  radiation  efficiency 

-Needs  SOI  substrate 

-  Mainlobe  distortion  (needs  thinning) 

-  Possible  damage  on  IC 

for  the  proton  implantation  method 

MEMS 

-  High  radiation  efficiency 

-  Bad  structural  robustness 

-  Bad  for  full  integration 

Substrate 

Dielectric  Lens 

-  Widely  used  in  mmW  range 

-  Applicable  to  various  types  of 

antennas 

-  High  directivity 

-  High  radiation  efficiency 

-  High  cost 

-  Large  lens  size 

-  Difficult  in  fabrication 

-  Extra  process 

Super-strate 
Dielectric  Lens 

-  Moderate  radiation  efficiency 

-  Good  for  full  integration 

-  Extra  process 

-  Not  yet  widely  used 

Dielectric 

Resonator 

Antenna 

-  High  radiation  efficiency 

-  Good  for  full  integration 

-  Extra  process 

-  Not  yet  widely  used 
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native  antenna  design  which  is  cost  effective  and  easy  for  integration  with  transceivers  in  a  single 
chip.  Moreover,  wafer- thinning  is  required  for  an  advanced  packaging  technology  to  improve  the 
insertion  loss  owing  to  wire-bonding. 


4.2  Antenna  Design  Considerations 


4.2.1  Antenna  Radiation  Efficiency 


The  input  impedance  of  an  antenna  is  computed  from  Z,„  =  V //( 0)  =  Ra  +  jX^,  where  7(0)  is  the 
current  value  at  the  input  terminals.  Antenna  radiation  efficiency  is  given  by 


V  = 


Rrr 

Ra 


(4.1) 


where  Ra  =  Rr  +  Rq.  is  the  real  part  of  antenna  input  impedance.  When  the  substrate  thickness 
is  thin  enough,  the  resonant  input  resistance  is  approximated  as  radiation  resonant  resistance  Rrr. 
However,  when  the  surface  wave  is  excited,  the  surface  wave  resonant  resistance  Rrs  has  to  be 
considered,  i.e.,  R,  =  Rrs  +  Rrr  since  Rrs,  and  Rrr  are  directly  proportional  to  the  power  coupled 
into  the  substrate  as  guided  modes  and  to  the  power  radiated  in  space,  respectively  [5].  Therefore, 
antenna  radiation  efficiency  can  be  expressed  as  follows: 


V 


R 


rr 


Rrr  T  Rrs  T  Rq 


(4.2) 


In  order  to  improve  radiation  efficiency,  both  Rq  and  Rrs  has  to  be  minimized.  Based  on  this 
expression,  HR  Si  substrate  technology  is  categorized  to  the  group  of  reducing  Rq,  and  sub¬ 
strate/supers  trate  dielectric  lens  technology  belongs  to  the  group  of  reducing  Rrs.  The  MEMS 
technology  and  wafer- thinning  technology  reduces  both  Rq  and  Rrs.  Therefore  wafer- thinning 
approach  could  be  a  good  candidate  for  the  on-chip  antenna  on  a  lossy  silicon  substrate  without 
modifying  antenna  structures. 


4.2.2  Wafer-Thinning  Technology 

Wafer  thinning  for  advanced  packaging  methods  has  gained  importance  as  demand  has  increased 
for  memory  cards,  portable  computing  systems,  multiple  chip  packages  (MCPs),  and  other  appli¬ 
cations  that  require  thin  integrated  circuits  (ICs)  [37].  Although,  it  is  a  significant  challenge  in 
thinning  wafers  to  final  thicknesses  less  than  100  gm,  semiconductor  packaging  groups  have  de¬ 
veloped  several  thinning  technology  to  achieve  thin- wafer  less  than  100  gm  with  high  reliability. 
Table  4.2  lists  widely  used  wafer-thinning  technology  to  relieve  stress  which  causes  die-cracking 
and  die-breaking.  In  terms  of  antenna  performance,  radiation  efficiency  can  be  effectively  im¬ 
proved  by  thinning  the  lossy  and  high  dielectric  substrate.  Moreover,  an  antenna  with  thinned 
wafer  has  less  radiation  pattern  distortion  as  surface  wave  excitation  is  minimized.  Surface  waves 
causes  distortion  (ripple)  in  the  radiation  pattern. 
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Table  4.2:  Summary  of  the  widely  used  wafer-thinning  technology. 


Technology 

Description 

Advantage 

Disadvantage 

Mechanical 

Grinding 

-Two-step  process  : 
Coarse  grinding 
(5um/sec) 

Fine  grinding 
(<lum/sec) 

-Fastest 

processing 

-Low  cost 

-Largest  damage 
-Defect  structures  remains 
-Rough  thickness  tolerance 
-Cannot  applicable  for  very 
thin  wafer  thinning 

Chemical 
Mechanical 
Polishing  (CMP) 

-Polishing  based  on 
buffered  silica  slurries 
( a  few  um/min.) 

-Very  flat  surfaces 
-Total  thickness 
variation  (TTV)  is 
low 

-Slowest  thinning  rate 
-  Applicable  for  200um  or 
more  thickness  depending  on 
the  wafer  size 

Wet-Etching 

(Spin-etching) 

-Front  surface  has  to 
be  protected 
-Mixture  of  HF  and 

HN03 

-Very  flat  surfaces 
comparable  to  CMP 

-Mass  production 

-MCL(minority  carrier 
lifetime)  is  much  higher  than 
dry-etching,  comparable  to 

CMP 

Atmospheric 
Downstream 
Plasma-Dry 
Chemical  Etching 
(ADP-DCE  ) 

-Uses  Ar/CF4  Plasma 
(20  um/min) 

-Uniformity  <2%  for 

20um  etching 

-Faster  than  CMP 
-Lesser  damage  than 
mechanical  grinding 

-Electrically  active  defects 
form  near  the  plasma-etched 
surface 
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4.2.3  Antenna  Unit  Element  Design 

In  order  to  integrate  an  antenna  on  a  silicon  chip,  a  good  radiation  efficiency  is  one  of  the  critical 
issues.  Another  issue  is  the  size  of  the  radiator.  Considering  silicon  integrated  circuit  process, 
planar  antennas  such  as  patch,  dipole  or  monopole,  and  slot  could  be  good  candidates  as  a  radiating 
element.  Among  them,  widely  used  circular  disk  monopole  antenna  and  two  types  of  slot  antennas 
(folded  slot/bow-tie  slot)  will  be  discussed  in  detail. 


Figure  4.2:  Designed  circular  disk  monopole  Antenna  (diameter=700  pm). 


Circular  Disk  Monopole  Antenna 

The  circular  disk  monopole  antenna  is  widely  used  for  ultra- wideband  application  owing  to  its  low 
group  delay  variation  with  wide  bandwidth  [24].  It  has  been  demonstrated  that  the  optimal  design 
of  this  type  of  antenna  can  achieve  an  ultra  wide  bandwidth  with  satisfactory  radiation  properties. 
The  performance  of  this  type  of  antenna  and  its  characteristics  in  the  frequency  domain  is  mostly 
dependent  on  the  feed  gap,  the  width  of  the  ground  plane  and  the  dimension  of  the  disc.  The 
first  resonant  frequency  is  directly  associated  with  the  dimension  of  the  circular  disc  because  the 
current  is  mainly  distributed  along  the  edge  of  the  disc  [24],  A  circular  monopole  antenna  was 
designed  using  a  0.13  pm  BiCMOS  process  which  has  oxide  thickness  of  12  pm.  The  top  metal 
layer  with  thickness  of  3  pm  (Metal  6)  was  used  for  antenna  structure.  The  width  of  the  ground 
plane  was  chosen  to  be  400  pm.  Fig.  4.2  shows  designed  circular  monopole  antenna  with  diameter 
of  700  pm. 
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Folded  Slot  Antenna 

In  terms  of  array,  a  slot  antenna  is  more  practical  than  a  printed  dipole  antenna  because  slots  cou¬ 
ple  to  the  dominant  TMo  mode  along  the  perpendicular  direction  to  their  axis  (broadside),  which 
can  alleviate  the  complexity  of  the  required  feed  network.  Moreover,  many  slots  could  be  inte¬ 
grated  in  the  same  ground  plane,  therefore,  it  can  be  easily  integrated  with  amplifiers  with  coplanar 
waveguide  (CPW)  transmission-lines.  The  folded  slot  antenna  is  another  popular  type  of  antenna 
which  can  be  easily  fed  by  CPW.  Usually,  the  circumference  of  the  folded-slot  is  designed  to  be 
approximately  equal  to  one  guided  wavelength  (Xg)  [42],  It  can  be  fed  with  a  CPW  allowing  for 
easy  integration  of  three-terminal  devices  or  MMICs  for  microwave  amplification  and  reception. 
Basically,  folded  antenna  has  the  characteristic  of  a  broad  bandwidth  of  frequency  and  a  radi¬ 
ation  pattern  with  maximum  radiation  at  the  broadside.  From  Babinets  principle  [1],  the  input 
impedance  of  complementary  antennas  can  be  calculated  by 

r\  rl  S'  rl'2< 

Zsl0t  =  J7—~  500H  (4.3) 

dipole 

In  order  to  reduce  the  Zsiot  and  to  realize  an  appropriate  matching  network,  several  stubs  can  be 
included  in  the  slot,  the  complement  of  the  /V-element  dipole  antenna  (Z-m  ^  =  N2Zciipoie).  There¬ 
fore,  an  /V-element  slot  antenna  input  impedance  can  be  reduced  by, 

Zwv  =  (4.4) 


Therefore  when  N  =  2,  around  100f2  of  the  input  impedance  could  be  realized.  Further  re¬ 
duction  of  the  antenna  impedance  is  performed  by  controlling  the  stub  size.  Fig.  4.3  shows  a 
designed  folded  slot  antenna  at  94  GHz.  In  terms  of  the  antenna  bandwidth,  the  Chu-Harrington 
and  McLean  Limits  relate  quality  factor  Q  (inverse  fractional  bandwidth)  of  an  ideal,  perfectly 
efficient  antenna  to  its  size  denoted  by  the  radius  r^.c  of  the  boundary  sphere  as  follows  [27]: 


1  +  2(2^r^c)2  ^  fc 

(2^;C)3(1  +  (2^C)2)  ~BW 


(4.5) 


where  fc  =  \ZJlJh ,  and  boundary  sphere  radius  in  units  of  wavelengths  at  the  center  frequency, 

n,c- 

Therefore,  by  increasing  the  size  of  a  slot,  the  physical  aperture  of  the  antenna  is  increased 
which  results  in  increase  of  antenna  bandwidth.  The  folded  slot  could  also  achieve  a  wideband 
characteristic  if  the  slot  becomes  wide  in  shape.  Fig.  4.4  shows  designed  “fat”  folded  slot  antenna. 
Comparing  previously  designed  slot  size  which  was  700  /im  x  30  gm,  the  slot  shape  widens  to 
515  gm  x  515  gm  to  increase  the  aperture  of  the  antenna.  From  HFSS  simulation,  its  bandwidth 
increased  from  10%  to  50%.  Therefore  bandwidth  has  to  be  compromised  with  area  consumption 
of  the  substrate.  In  order  to  achieve  wider  band  matching  characteristic,  an  internal  stub  at  the 
center  of  the  rectangular  slot  can  be  tuned  [18]. 
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Figure  4.3:  Designed  folded  slot  antenna  (slot  size  :  700  jim  x  30  fim). 


Bow-tie  Slot  Antenna 

The  bow-tie  slot  antenna  is  a  dual  of  the  bow-tie  dipole  antenna  which  is  in  the  category  of  the 
frequency  independent  antenna.  Therefore  this  antenna  can  have  wide  input  matching  character¬ 
istic.  Because  the  slot  is  formed  in  the  ground  plane,  this  antenna  could  be  easily  fed  by  CPW. 
Fig.  4.5  shows  the  designed  bowtie  slot  antenna  with  stub  [17].  The  design  was  based  on  the  a 
generic  90nm  CMOS  process  with  an  oxide  layer  thickness  of  6  jtmi.  The  top  metal  layer  (Metal 
7)  was  used  as  the  antenna  element.  The  stub  was  used  to  increase  the  bandwidth,  which  lowers 
the  antenna  input  impedance  to  be  around  500. 

4.2.4  Antenna  Array  Considerations 

When  a  multiple  number  of  antennas  are  used  as  an  array,  each  unit  element  is  affected  by  mu¬ 
tual  coupling  of  other  antenna  elements  surrounding  it.  The  mutual  coupling  distorts  the  radiation 
pattern  of  unit  elements,  and  changes  the  input  impedance  characteristics.  Moreover,  it  causes 
blind-spots  for  the  phased  array  system  [32],  Therefore  mutual  coupling  effects  must  to  be  con¬ 
sidered  early  in  the  design  process.  Array  pattern  (F(0,0))  approximation  under  infinite  array 
condition  is  as  follows: 

F(0,<l>)=gae(0,<l>)f(e,<l>)  (4.6) 

where  gae (9.<p)  is  an  average  active-element  pattern,  and  f(9,<j>)  is  array  factor  caused  by  the 
antenna  array.  Fig.  4.6  shows  3x3  arrays  of  folded  slot  antennas.  From  the  simulation  of  HFSS, 
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Figure  4.4:  Fat  folded  slot  antenna  (slot  size  :  515  jum  x  515  fim). 

the  radiation  pattern  was  slightly  changed  from  that  of  single  folded  slot  antenna.  This  simulation 
setup  considers  every  effect  in  terms  of  EM  phenomenon  except  edge  effects.  However,  it  is 
impractical  to  run  a  3-D  EM  simulation  for  a  large  array  considering  simulation  time  consumption. 

When  the  number  of  unit  element  becomes  large  enough,  unit  elements  couple  to  each  other, 
and  the  effective  radiation  patterns  become  symmetric.  For  a  large  enough  array,  we  can  assume 
the  infinite  array  approximation.  The  infinite  array  environment  was  modeled  using  Master-Slave 
boundary  setup  in  HFSS  with  PML  (Perfect  Matching  Layer)  boundary  condition  [7],  The  Master- 
Slave  setup  replicates  the  fields  considering  only  the  phase  difference  with  the  assumption  of 
the  infinite  array  environment.  In  wafer  scale  radio  applications,  more  than  hundreds  of  antenna 
elements  would  be  integrated  in  a  single  wafer.  Therefore,  it  is  reasonable  to  apply  the  infinite 
array  approximation.  However,  infinite  array  approximation  does  not  reflect  edge  effects  which 
will  affect  on  the  input  impedance  of  the  antennas  around  edge  as  well  as  antenna  side-lobes. 

In  a  phased  array,  the  input  impedance  (scan  impedance)  changes  with  respect  to  array  scan 
angle.  When  the  scan  reflection  coefficient  (T5)  has  magnitude  of  unity  for  a  certain  scan  angle, 
the  angle  is  called  a  “blind  spot”.  At  the  blind  spot,  the  scan  element  pattern  is  zero,  and  all  input 
power  is  converted  to  surface  wave  power  at  the  blind  spot.  This  blind  spot  cannot  be  found  from 
either  an  average  active-element  pattern  or  array  factor  from  the  array.  Therefore,  it  has  to  be 
characterized  using  a  proper  EM  simulation  or  in  an  analytical  way.  In  order  to  include  this  effect, 
the  average  active  element  pattern  can  be  expressed  as  follows  [33] 

gae(Qo,<Po)  ~  gi(0O,0o)(l  “  1^(00,  0o)  |2)  (4.7) 
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Figure  4.5:  Designed  bow-tie  slot  antenna. 

where  |rs(0o,^o)|  =  1  when  the  scan  angle  (0q)  is  at  the  blindness  angle.  It  is  known  that  the 
active  element  pattern  for  the  infinite  planar  printed  dipole  array  has  the  scan  blindness  at  an  angle 
of  about  46  degrees.  Another  consideration  in  a  large  array  is  that  there  is  the  lack  of  surface 
wave  loss  for  the  infinite  array  except  at  the  blindness  angle  [32],  Therefore,  for  a  large  antenna 
array,  ohmic  loss  becomes  important  factor.  From  this  aspect,  the  wafer-thinning  technique  which 
physically  eliminates  large  portions  of  the  lossy  silicon  substrate  is  an  appropriate  approach  in 
improving  radiation  efficiency.  However,  further  investigation  is  required  as  to  how  surface  wave 
effects  would  change  the  blindness  spot  characteristics.  For  a  small  antenna  array,  the  antenna  mu¬ 
tual  coupling  affects  on  input  impedance  of  each  element  as  well  as  radiation  pattern  is  captured. 
Therefore,  antenna  elements  around  the  edge  of  the  wafer  have  to  be  designed  separately. 

4.3  Simulation  Results  and  Discussions 

In  3-D  electromagnetic  simulation,  HFSS  verlO/verllwere  used  for  the  simulations  given  this 
section.  The  silicon  dielectric  loss  tangent  was  not  considered,  and  the  conductivity  of  the  silicon 
substrate  was  set  to  be  10  mS/m. 
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Figure  4.6:  3x3  array  folded  slot  antenna. 

4.3.1  Circular  monopole  antenna 

A  circular  monopole  antenna  was  designed  based  on  a  0.13  gm  BiCMOS  process  (ST- Microelectronics 
B9MW)  which  has  oxide  thickness  of  12  [im.  The  top  metal  layer  with  thickness  of  3  /im  (Metal 
6)  was  used  for  antenna  structure.  Fig.  4.7  shows  the  simulated  radiation  pattern  of  HFSS  vlO  for 
the  designed  circular  mono-pole  antenna.  When  the  substrate  is  thinned  up  to  50  /im,  the  radiation 
efficiency  was  75.4%.  For  100  /im  of  substrate  thickness,  the  radiation  efficiency  was  63%.  The 
radiation  pattern  has  typical  omni-directional  pattern  which  has  an  isotropic  pattern  in  the  H-plane. 

The  peak  directivity  of  this  antenna  was  2.24  dBi.  The  designed  circular  disk  monopole  antenna 
provides  more  than  40  GHz  bandwidth  for  50£2  input  matching.  The  diameter  of  the  disk  was  700 
/im.  One  of  main  drawbacks  of  this  antenna  is  the  effect  of  ground  plane  underneath  of  the  disk. 
This  ground  plane  affects  not  only  on  the  single  antenna  elements,  but  also  makes  it  difficult  to 
employ  it  as  a  unit  element  for  an  antenna  array.  This  occurs  since  a  unit  antenna  is  affected  by 
the  ground  plane  images  around  it,  caused  by  other  neighboring  elements. 
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Figure  4.7:  The  radiation  pattern  of  the  circular  disk  monopole  antenna. 

4.3.2  Simulation  Results  of  Folded  Slot  Antenna 

For  the  designed  folded  slot  antenna,  the  slot  length  is  700  /im  and  its  width  is  30  jum.  The 
structure  is  shown  in  Fig.  4.3.  The  length  of  the  stub  is  600  jum  with  the  width  of  10  /im,  the 
gap  in  the  bottom  is  6.6  /im  and  the  gap  on  top  is  13.4  jim.  Fig.  4.8  shows  the  radiation  pattern 
of  the  designed  antenna.  The  radiation  pattern  has  the  broadside  characteristic  and  the  maximum 
directivity  is  4.98  dBi  in  the  direction  of  the  substrate  and  3.43  dBi  toward  air.  Fig.  4.9  presents 
radiation  efficiency  depending  on  the  substrate  thickness. 

The  radiation  efficiency  is  57%  when  substrate  thickness  is  set  at  100  /tm.  When  hi 200  /im, 
there  was  steep  efficiency  degradation  which  is  due  to  the  surface  wave  mode  excitation.  Fig.  4.9 
presents  the  radiation  efficiency  of  the  folded  slot  antenna  depending  on  substrate  thickness. 

As  shown  in  the  Fig.  4.10,  the  designed  folded  slot  antenna  has  a  bandwidth  of  10%.  As 
described  in  section  4.2.3,  the  slot  size  should  be  increased  to  achieve  wider  frequency  bandwidth. 
The  designed  folded  slot  antenna  provides  more  than  50  GHz  bandwidth  when  the  input  impedance 
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Figure  4.8:  The  radiation  pattern  of  the  circular  disk  monopole  antenna. 

of  the  signal  source  is  40£2  from  the  Fig.  4.1 1.  As  shown  in  the  Fig.  4.4,  the  designed  “fat”  version 
of  the  folded  slot  antenna  has  larger  aperture.  When  substrate  thickness  is  100  jim,  the  simulated 
radiation  efficiency  for  the  fat  version  is  65.1%,  which  is  slightly  better  than  that  of  circular  disk 
monopole  antenna.  The  three  dimensional  radiation  pattern  is  shown  in  Fig.  4.12.  The  peak 
directivity  of  this  antenna  is  3.51  dBi. 

4.3.3  Simulation  Results  of  Bow-tie  Slot  Antenna 

A  bow-tie  slot  antenna  was  designed  based  on  a  90nm  CMOS  process  which  has  oxide  layer 
thickness  of  6  /tm.  The  top  metal  layer  of  0.9  jum  (Metal  7)  was  used  as  the  antenna  element. 
Fig.  4.13  shows  radiation  pattern  of  the  designed  bow-tie  slot  antenna.  Owing  to  wafer-thinning, 
the  radiation  pattern  was  almost  symmetric  for  top  and  bottom.  Considering  50  Q.  input  impedance 
of  the  radiating  source,  the  center  frequency  is  90  GHz  and  bandwidth  is  17  GHz.  The  radiation 
efficiency  is  52%  when  substrate  thickness  is  50  jum,  and  46%  when  the  thickness  is  100  jum.  The 
peak  directivity  of  the  antenna  was  5.4dBi. 
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Figure  4.9:  The  radiation  efficiency  of  the  folded  slot  antenna  depending  on  substrate  thickness. 

4.3.4  Simulation  Results  for  Antenna  Array 

In  order  to  simulate  a  large  array,  the  infinite  array  environment  was  modeled  using  Master-Slave 
boundary  setup  in  HFSS  with  PML  (Perfect  Matching  Layer)  boundary  condition.  Fig.  4.14  shows 
the  boundary  condition  set-up  for  active  element.  From  the  simulation,  directivity  difference  for 
the  active  antenna  element  between  two  cases  was  around  1  dB,  and  the  main  difference  occurs 
around  the  null.  Therefore,  the  infinite  array  approximation  is  an  effective  tool  to  examine  large 
arrays  despite  the  error  in  neglecting  the  substrate  edge-effect.  Fig.  4.15  compares  the  radiation 
pattern  of  the  folded  slot  antenna  extracted  from  HFSS.  The  left  radiation  pattern  is  for  a  single 
antenna  element  among  the  3x3  array,  the  right  pattern  is  extracted  using  Master-Slave  setup  in 
HFSS. 

The  same  simulation  setup  was  applied  to  the  bow-tie  slot  antenna  case  with  substrate  thickness 
of  100  /im  which  is  shown  in  Fig.  4.16.  The  radiation  pattern  of  the  active  element  is  more 
symmetric.  The  Master-Slave  setup  replicates  the  fields  considering  only  the  phase  difference 
with  the  assumption  of  the  infinite  array  environment.  As  shown  in  the  Fig.  4.17,  this  array  has 
20  dB  of  antenna  gain,  and  the  side-lobe  level  (SLL)  was  13.4  dB,  which  follows  from  a  uniform 
array  with  uniform  power  distribution  setup.  The  radiation  efficiency  was  degraded  for  all  the 
three  antenna  arrays  to  around  40%. 

Using  the  Master-Slave  setup,  the  phase  difference  between  each  boundary  was  used  as  a  scan 
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Figure  4.10:  SI  1  of  the  folded  slot  antenna. 


angle  input.  Impedance  layers  with  377  Cl  were  also  inserted  on  top  of  each  PML  layers.  To 
simulate  the  vector  property  of  the  scan  angle,  the  impedance  was  scaled  depending  on  0o,  given 
by  376.7cos(0o)-  Fig-  4.18  shows  Sll  of  the  infinite  folded  slot  antenna  array.  Because  it  is 
a  complementary  of  the  printed  dipole  antenna,  the  blindness  spot  is  expected  to  be  at  the  same 
place.  The  resulting  blindness  spot  was  around  46  degrees  which  is  similar  to  the  theoretical  value. 


4.4  Conclusion 

We  found  that  the  wafer-thinning  technology  is  an  attractive  way  of  improving  the  antenna  radi¬ 
ation  efficiency  considering  cost  and  the  realizability  of  an  antenna  array  structure  on  the  wafer. 
We  designed  three  different  types  of  antennas  to  investigate  the  feasibility  of  the  fully  integrated 
antenna  on  a  lossy  silicon  wafer.  The  radiation  efficiency  could  be  more  than  70%  when  the  sil¬ 
icon  substrate  is  thinned  up  to  50  jum  for  the  designed  antenna  structures.  In  order  to  design  the 
antenna  array,  we  considered  mutual  coupling  under  the  infinite  array  approximation.  The  blind¬ 
ness  spot  for  the  infinite  array  using  HFSS  Master/Slave  boundary  condition  corresponded  well 
with  the  reported  analytical  result  [32],  The  simulation  results  showed  that  the  fully  integrated  an¬ 
tenna  could  be  possible  on  a  silicon  wafer  with  radiation  efficiency  more  than  40%  for  100  jum  of 
substrate  thickness.  Considering  3-4  dB  insertion  loss  caused  by  off-chip  wire-bonding  and  elec- 
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Figure  4.11:  Sll  of  the  “fat”  folded  slot  antenna. 

trostatic  discharge  (ESD)  protection  circuits,  on-wafer  integrated  antenna  would  be  advantageous 
and  obviate  the  need  for  mm-wave  packaging  and  testing. 
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Figure  4.12:  3-D  Radiation  pattern  of  the  “fat”  folded  slot  antenna. 
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Figure  4.13:  3-D  Radiation  pattern  of  the  “fat”  folded  slot  antenna. 
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Figure  4.14:  Master/Slave  boundary  setup  for  an  active  element  in  infinite  array  condition. 
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Figure  4.15:  Radiation  pattern  of  the  active  element  comparison  between  full  EM  simulation  (left) 
and  infinite  array  approximation  (right)  for  3  x  3  folded  slot  antenna  array. 


Figure  4.16:  Master/Slave  boundary  setup  for  an  active  element  in  infinite  array  condition. 
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Figure  4.17:  Radiation  pattern  for  10  x  10  bow-tie  slot  antenna  array  in  the  infinite  array  condition 
and  its  radiation  pattern. 


Figure  4.18:  S 1 1  with  respect  to  scan  angle  for  the  folded  slot  antenna  with  infinite  array  condition. 
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Chapter  5 

Clock  Distribution  and  Synchronization 


5.1  System  Synchronization  Design  Considerations 

5.1.1  Introduction 

Power  on  three  clock  networks  (H-tree,  distributed  PLLs  [20],  coupled  oscillators)  versus  the 
scaling  of  the  chip  size  is  investigated  for  synchronous  digital  systems.  In  [20],  the  relations 
between  the  clock  uncertainty  and  the  parameters  in  H-tree  and  distributed-PLLs  clock  network 
are  given.  It  was  shown  that  the  distributed-PLL  clock  network  is  superior  in  terms  the  scaling 
of  technology  nodes.  However,  to  determine  which  clock  network  is  preferable  in  a  wafer-scale 
system,  the  clock  power  versus  the  scaling  of  the  chip  size  is  estimated  and  compared,  given  a 
specified  clock  uncertainty.  It  is  concluded  that  the  distributed  clock  networks  consume  much  less 
power  than  traditional  H-tree  network  as  chip  size  scales.  Moreover,  a  coupled-oscillator  network 
is  promising  when  applied  to  large-scale  synchronous  digital  systems. 

5.1.2  Methods,  Assumptions,  and  Procedures 

Technology 

The  power  of  each  clock  network  is  normalized  to  the  case  where  a  1-GHz  clock  is  distributed  to 
a  total  of  555-pF  load  capacitance  over  a  chip  size  of  2  cm  x  2  cm  in  a  90-nm  CMOS  technology, 
while  each  clock  network  needs  to  meet  a  clock  uncertainty  of  115  ps  (11.5%  of  the  clock  cycle) 
across  the  chip.  A  1  -jUm  wide  wire  on  Metal-9  has  sheet  resistance  /?„, =0.0460 / □  and  capacitance 
of  CH  =0.03  fF/jUm.  The  equivalent  resistance  /?□  and  input  capacitance  Cg  for  an  inverter  is  25.57 
kO/D  and  1.17  fF/jUm,  respectively. 
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Figure  5.1:  H-Tree  clock  network  with  level  n  =  4. 

Model  of  Clock  Uncertainty  and  Clock  Power 
H-Tree 

Fig.  5.1  shows  an  H-Tree  clock  network  with  level  n  =  4.  An  n-level  tree  would  results  in  2 n  tiles, 
and  the  length  from  the  root  to  the  leaf  L  is  (chip  dimension)  x  (1 .5(1  /2)n/2).  We  have  assumed 
that  the  H-Tree  is  optimally  buffered  [35]  and  the  width  of  the  tree  halves  after  each  branching. 
The  clock  uncertainty  comes  from  the  uncertainty  from  the  root  to  the  leaves  of  the  H-tree  and 
the  delay  mismatches  from  the  leaves  to  the  flip-flops  within  the  tile,  called  the  internal  skew 
hk,im  [20]-  The  delay  uncertainty  from  the  root  to  the  leaves  due  to  the  variability  in  the  buffered 
segments  and  time- varying  noise  coupled  from  signal  transitions  of  the  nearby  wires  (called  jitter 
t jitter)  is  linearly  proportional  to  L  [20],  which  is  determined  by  the  incorporated  variability  model. 
The  internal  skew  is  inversely  proportional  to  the  square  of  the  length  from  the  leaves  to  the  clock 
loads  or  the  area  of  the  tile.  The  clock  uncertainty  At  then  can  be  formulated  as 

At  =  atH  —tree  t jitter  H-  hkjnt  (5.1) 

where  a  is  assumed  to  be  0.1.  For  an  8-level  tree,  L  =  2.875  cm.  The  length  of  the  optimal 
buffer  segment  and  the  optimal  buffer  size  are  calculated  to  be  2.4  mm  and  73  fF,  which  gives 
the  delay  from  the  root  to  the  leaves  to  be  270  ps.  So  the  clock  uncertainty  in  the  first  term  is 
due  to  the  uncertainty  in  the  delay  from  the  root  to  the  leaves  is  27  ps.  To  estimate  the  jitter,  it 
is  assumed  that  up  to  5%  of  the  capacitance  of  any  wire  may  transition  during  the  time  a  clock 
edge  propagates  [20],  Given  the  wire  capacitance  in  each  segment  to  be  72  fF  and  the  input  and 
output  capacitance  of  the  inverter  buffer  to  be  73  fF  (assume  y  =  1),  the  variation  in  delay  is  ± 
(5%  x  72)7(72+72+73)=  ±1.6%.  So  the  total  variation  would  be  3.2%  or  8.6  ps.  The  maximum 
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internal  skew  is  0.046  x  625  jum  x  514.7  fF  =  15.5  ps.  So  the  total  clock  uncertainty  is  51  ps.  The 
total  clock  power  is  contributed  by  the  total  wire  capacitance  of  the  tree  and  the  buffers,  which  is 
proportional  to  2n/2. 

With  the  relations  between  clock  uncertainty/power  and  the  depth  of  the  tree,  the  required  tree 
depth  needed  to  meet  the  total  clock  uncertainty  for  different  chip  sizes  can  be  calculated. 

1  5  —  2n'  !2  2~n' /2 

At  =  ( CCtH-tree0  + 1 jittery )  '  s  '  j  _  2«0/2  tskJnt0  '  ^  '  2-n0/2  <  '  ^pS  (^.2) 

where  no  =  8  (which  is  the  minimum  level  to  meet  this  constraint  for  5=1),  tn-treeo  =  27  ps, 
tjutero  =  8.64ps,  =  15.5ps,  and  5  is  the  scaling  factor  of  the  chip  size. 


Phase  Detector 


Figure  5.2:  Distributed-PLL  clock  network  with  16  tiles. 


Distributed  PLLs 

Fig.  5.2  shows  the  distributed-PLL  network,  where  each  tile  has  its  own  local  oscillator  and  there 
is  a  phase  detector  in  each  boundary  [20],  Ideally,  the  incorporated  PLL  is  a  Type-II  feedback 
system  and  hence  the  phases  between  each  local  oscillator  are  matched.  However,  every  boundary 
between  tiles  introduces  some  skew  because  of  mismatch  in  the  phase  detector.  As  the  number  of 
tiles  increases,  the  number  of  boundaries  and  hence  the  clock  uncertainty  across  the  chip  increase. 
In  addition,  there  are  still  internal  skews  within  tiles. 
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The  clock  uncertainty  and  power  for  a  distributed-PLL  network  with  nxn  tiles  is  formulated 
as  follows 


2 

A t  —  n  tsk  pd  +  tskji 
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n-1 


(5.3) 
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where  ts^pD  =  0.05  ps  is  the  skew  due  to  the  mismatch  in  phase  detectors,  Posc  the  power  of 
an  oscillator,  and  Pinterconnect  is  the  power  on  the  interconnect.  Similar  to  the  H-Tree,  given  the 
relation  between  clock  uncertainty/power  and  the  number  of  tiles,  the  required  number  of  tiles  n 
for  different  chip  sizes  can  be  calculated. 
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where  t^po  =  0.05  ps,  no  =9  (which  is  the  minimum  n  to  meet  this  constraint  for  .v  =  1),  the 
resulted  tsk.im  is  0.046  x  1 1 1 1  /im  x  1 .7  pF  =  64  ps  and  the  total  clock  uncertainty  is  91 .6  ps.  The 
clock  power  for  5  =  1  is  normalized  to  be  the  same  as  that  in  the  H-Tree  for  the  ease  of  comparison. 


Figure  5.3:  Coupled-oscillator  clock  network  with  16  tiles. 


Coupled-Oscillator  Network 

Fig.  5.3  shows  a  coupled-oscillator  network,  which  is  a  natural  distributed  PLLs  network  only  to 
have  a  non-linear  Type-I  characteristic.  Specifically,  in  the  steady-state  there  are  phase  mismatches 
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between  oscillators  according  to  the  differences  in  their  free-running  frequencies.  The  relation  is 
described  by  the  Kuramoto  model 


j  rj  ls  N 

^  £  sin(0y  -  Si)  (5.6) 

where  0,  is  the  phase  of  each  oscillator,  ft),  the  difference  between  the  free-running  frequency  in 
each  oscillator  and  the  average  frequency,  K  the  coupling  strength,  N  the  number  of  oscillators. 
Given  the  standard  deviation  of  the  free-running  frequency,  the  standard  deviation  of  the  phase 
mismatch  can  be  found  by  setting  dOi/dt  =  0  (in  the  steady  state)  and  inverting  the  sine  operation. 
To  simplify  the  analysis,  we  approximated  the  phase  mismatch  as  AO  with  sin(A0)  =  (£sin(0y  — 
0,j)  /N.  Hence  the  clock  uncertainty  is 


*sk,osc  =  sin-1  A0  =  l~%  sin-1  J  (5.7) 

where  Aft)  is  the  standard  deviation  of  the  free-running  frequency.  If  Aft)  is  independent  of  the  chip 
size,  the  clock  uncertainty  and  power  for  a  coupled-oscillator  clock  network  with  n  x  n  tiles  is 
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Since  the  clock  uncertainty  between  the  coupled  oscillators  does  not  scale  with  the  chip  size,  n  is 
linearly  proportional  to  5.  For  ts^osc  =  48ps,  «o  =  10  would  give  tsk.int  =  0.046  x  lOOOqm  x  1.4 
pF  =  64  ps  to  meet  the  constraint  of  the  total  clock  uncertainty.  Similarly,  the  power  for  s  =  1  is 
normalized  to  that  for  the  H-Tree. 


5.1.3  Results  and  Discussions 

Fig.  5.4  shows  the  power  versus  the  scaling  of  the  chip  size  for  the  above  three  cases.  The  power 
for  the  distributed-PLL  and  coupled-oscillator  case  is  normalized  to  that  of  the  H-Tree,  which  is 
expressed  in  terms  of  the  total  clock  load  (550  pF).  The  power  of  the  H-Tree  grows  exponentially 
with  the  chip  size  while  the  other  two  distributed  networks  show  much  slower  trend.  Even  if 
the  power  is  not  a  limiting  factor,  the  chip  size  cannot  keep  increasing  for  the  H-Tree  and  the 
distributed  PLLs  since  the  constraint  on  clock  uncertainty  can  no  longer  be  met.  However,  a 
coupled-oscillator  network  is  not  limited  by  the  performance  constraint  as  chip  size  scales.  We 
assume  the  variation  of  the  oscillation  frequency  does  not  depend  on  the  chip  size,  which  needs 
further  verification  with  simulation  tools. 
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Figure  5.4:  Power  for  the  three  clock  networks  versus  chip  dimension  scaling  factor. 

The  trend  of  clock  power  as  chip  size  scales  is  investigated  for  three  cases.  Distributed 
clock  networks  show  significantly  lower  power  than  a  traditional  clock  tree.  Moreover,  coupled- 
oscillator  clock  network  is  not  performance-limited  as  chip  size  scales.  Therefore,  it  is  favorable 
to  apply  distributed  clock  network  such  as  coupled  oscillators  in  a  wafer-scale  system. 

5.2  Clock  Distribution 

The  distributed  radio  on  the  whole  wafer  is  composed  by  thousands  of  individual  transceivers, 
creating  a  large  flexible  phased  array.  Synchronization  of  these  transceivers  is  mandatory  for 
achieving  good  performance  of  the  beam  forming  and  required  directivity.  The  purpose  of  this 
task  is  to  study  the  clock  distribution  and  synchronization  at  the  wafer  scale.  The  distributed  clock 
would  be  used  both  for  baseband  data  synchronization  at  few  GS/s  and  as  a  reference  for  the 
90GHz  LO  PLLs. 

In  the  first  part,  the  main  criterions  for  the  clock  distribution  scheme  will  be  derived  from 
the  constraints  of  the  system.  From  a  panel  of  architectures,  a  clock  distribution  using  coupled 
standing-wave  oscillators  has  shown  to  be  promising  for  this  large-scale  application.  The  design 
flow  and  simulation  tools  will  be  introduced  and  transmission  line  model  will  be  described  in  this 
preliminary  part. 

In  a  second  section,  the  coupled  standing-wave  oscillators  architecture  will  be  reviewed  and  a 
design  methodology  will  be  exposed  for  reducing  both  the  clock  skew  between  clusters  and  the 
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global  power  consumption.  Issues,  such  as  tolerance  to  process  variations,  clock  buffer  design 
and  locking  range,  will  be  addressed  for  this  particular  implementation  to  evaluate  if  this  scheme 
is  suitable  for  a  wafer-scale  clock  distribution  with  good  synchronization.  Finally,  as  integration 
is  of  obvious  interest  in  this  project,  this  clock  distribution  scheme  should  be  co-designed  with 
the  antenna  pattern  and  perturbations  of  this  scheme  on  the  radiation  pattern  must  be  investigated. 
We  propose  here  a  clock  distribution  pattern  that  fits  the  antenna  pattern  and  is  well  suited  for 
the  cluster-based  distribution  as  clock  skew  between  them  is  reduced.  From  the  defined  pattern, 
simulations  of  coupled  effects  at  large-scale  must  be  investigated  to  ensure  that  there  is  no  locking 
issues  between  clusters. 

This  study  has  shown  that  the  coupled  standing-wave  oscillators  are  very  good  candidates  for 
the  clock  distribution  at  the  wafer-scale  in  terms  of  acceptable  synchronization  and  reduced  power 
consumption.  Although  the  simulation  results  seem  promising,  further  work  would  demonstrate 
the  feasibility  of  the  proposed  co-designed  clock  distribution  pattern  on  a  small  scale  before  being 
able  to  address  the  large-scale  locking  issues  that  can  appear  on  a  whole  wafer. 

5.3  Methods,  Assumptions,  and  Procedures 

5.3.1  Assumptions  for  the  wafer-scale  system 

For  this  study,  we  will  assume  that  the  reticles  over  the  wafer  can  be  stitched  to  each  others  with 
a  good  alignment  to  ensure  communication  between  them  for  clocks,  control  signals  and  data 
networks.  This  is  largely  done  in  image  sensor  chip  design  where  area  larger  than  one  reticle 
is  needed  [23]  [8].  Furthermore,  each  reticle,  that  may  contain  several  transceivers,  would  be 
identical.  That  will  define  one  of  the  biggest  constraints  for  the  clock  distribution. 

Another  assumption  is  that  power  is  supplied  to  all  reticles  the  same  way.  As  antennas  would 
be  integrated  on  top  of  the  wafer,  power  can  be  supplied  either  from  the  sides  of  the  wafer  or 
through  the  backside  using  through-substrate  interconnects  (TSI).  The  first  option  does  not  seem 
to  be  feasible  due  to  different  travel  lengths  between  reticles  in  the  middle  and  at  the  periphery  of 
the  wafer.  A  through  substrate  power  delivery  is  more  suitable  and  would  be  assumed  here.  In  the 
continuation  of  this  project,  this  system  level  issue  must  be  addressed  and  opens  a  good  research 
field  on  3D  integration,  correlated  with  the  wafer-scale  project. 

5.3.2  Criterions  for  the  clock  distribution  and  architecture  choice 

The  main  criterions  for  choosing  a  convenient  distribution  scheme  are  the  ability  to  be  tile-able 
and  repeatable  and  to  provide  low  skew.  Furthermore,  the  power  consumption,  depending  on 
frequency  of  operation  and  granularity  of  the  scheme,  is  another  important  criteria.  The  following 
architectures  for  clock  distibution  has  been  identified  and  discussed: 

•  H  clock  trees  or  grids.  The  clock  tree  implementation  suffers  from  the  fact  that  it  is  not 
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easily  patternable  and  presents  large  clock  skew.  A  grid  can  help  reduce  clock  skew  but 
increase  power  consumption  during  transients  and  has  a  slower  edge  rate. 

•  Clock  trees  with  feedbacks  [19].  Feedbacks  can  be  included  to  compensate  for  the  clock 
tree  delays  but,  on  a  whole  wafer,  it  appears  hard  to  compare  nodes  placed  far  away  for  each 
other. 

•  Bi-directonal  signaling  [34],  A  bidirectional  transmission  line  travels  all  over  the  wafer  and 
average  time  extractors  are  used  to  pick  up  a  synchronous  clock  at  any  point.  This  approach 
provides  low  skew  and  clever  floorplan  can  be  used  to  reach  all  reticles.  The  disadvantage 
is  that  it  requires  no  defects  on  the  wafer  reticles.  If  a  reticle  does  not  work  properly,  then 
the  line  is  cut  and  disconnects  all  following  reticles. 

•  Rotary  travelling- wave  oscillators  [43].  A  wave  travels  along  a  differential  transmission  line 
having  edges  inverted  and  connected  together.  The  phase  varies  linearly  over  the  transmis¬ 
sion  line.  By  coupling  another  oscillator  at  the  right  location,  the  right  phase  of  the  signal 
can  be  extracted  to  lock  the  oscillators  together.  This  scheme  appears  suitable  for  a  wafer- 
scale  distribution. 

•  Coupled  standing- wave  oscillators  [29]  [6].  Contrary  of  travelling  waves,  the  standing- wave 
structure  just  reflects  the  waves  travelling  along  its  lines,  creating  a  constant  phase  signal, 
independent  of  the  position.  Nevertheless,  the  amplitude  will  vary  from  one  position  to 
another.  This  architecture  would  require  compensation  of  the  line  losses. 

•  Coupled  ring  oscillators  [22]  [  14] .  Ring  oscillators  would  be  designed  locally  on  each  cluster 
and  then  connected  to  their  neighbors  to  lock  them  together.  The  obvious  limitation  is  the 
large  length  required  between  each  cluster  that  adds  some  latency  on  the  coupling  scheme 
and  thus  increases  the  clock  skew. 

•  Distrbuted  PLLs  [21].  VCOs  are  distributed  on  each  cluster  and  phase  detectors  are  imple¬ 
mented  at  the  boundary  between  two  clusters.  The  PLLs  then  corrects  for  the  phase  and 
frequency  mismatch  between  two  VCO  cores.  The  same  limitation  as  coupled  ring  oscilla¬ 
tors  is  found  as  the  length  between  the  VCOs  and  the  phase  detectors  would  be  significant. 

Rotary  travelling-wave  and  coupled  standing-wave  oscillators  seems  suitable  for  a  wafer-scale 
clock  distribution,  as  they  inherently  cover  large  areas  and  can  be  easily  physically  coupled  to 
their  closest  neighbors.  Standing-wave  oscillators  are  particularly  interesting  due  to  the  fact  that 
the  phase  of  the  oscillation  remains  constant  all  over  the  transmission  lines,  what  is  greatly  suitable 
to  coupled  structures.  Traveling-wave  oscillators  rotate  the  phase  and  require  special  care  to  be 
coupled  together.  For  this  reason,  the  focus  will  be  made  on  the  study  of  coupled  standing-wave 
oscillators. 
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5.3.3  Design  Flow  and  simulation  tools 

For  the  evaluation  of  the  coupled  standing-wave  oscillators  potential,  several  tools  have  been  used 
together.  For  the  line  model,  ADS  Momentum  electromagnetic  simulator  has  been  used  to  extract 
the  S-parameters  of  a  transmission  line  in  a  nanoscale  CMOS  back-end  process  and  to  create  a 
RLCG  model  of  a  small  portion  of  the  line.  Cadence  simulation  tools  have  been  used  to  simulate 
the  transmission  lines  together  with  the  compensation  cells  and  to  evaluate  the  performance  in 
terms  of  power  consumption,  clock  skew  and  amplitude  swing.  A  theoretical  study  has  been 
performed  using  Matlab  to  define  a  design  methodology  for  sizing  the  parameters  relating  to  the 
desired  performance  and  to  validate  the  Cadence  simulation  results. 

In  further  steps  of  the  design,  electromagnetic  simulators  would  be  used  for  simulating  the 
co-integrated  antennas  with  the  coupled  standing-wave  oscillators  and  large-scale  numerical  sim¬ 
ulations,  such  as  Spice++  simulator,  would  be  used  to  detect  any  locking  issue  between  oscillators 
at  the  wafer-scale. 
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Figure  5.5:  Model  for  the  transmission  line  in  a  top  metal  layer  with  infinite  ground. 


5.3.4  Transmission  Line  model 

Coupled  standing- wave  oscillators  use  transmission  lines  in  a  nanoscale  CMOS  backend  that  have 
to  be  modeled.  A  simple  model  for  transmission  lines  in  a  top  metal  layer  has  been  made  in  Mo¬ 
mentum  for  deriving  the  RLCG  parameters  of  the  line.  The  model  used  is  described  in  Fig.  5.5  and 
uses  a  low  metal  layer  as  a  ground.  This  ground  is  modeled  with  an  infinite  thickness.  S-parameters 
for  1 2/iin  wide  and  4/im  space  differential  lines  is  extracted.  Then  frequency-dependent  RLCG 
parameters  are  obtained  using  the  following  formulas  [16]  [26]  and  are  imported  into  Cadence 
circuit  simulation  tools  as  a  mtline  element. 
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First,  the  S-parameters  must  be  converted  into  ABCD  parameters  using: 


A  = 

B  =  Z0 
C  = 


(l+5n)(l-522)+51252i 

2S2i 

(1+5„)(1+522)-5i252i 
pon  2  S2i 

1  (1  —  iSn)  (1  —S22)  —S12S21 


Z0  ,port  252i 

(1-5h)(1+522)  +  512521 


D  = 


2521 


Next,  Zq  and  y  are  computed  from  the  ABCD  parameters  using: 


Zo=\l^ 


r= 


c 

arccosh(A ) 

7 


Finally,  RLCG  parameters  are  extracted  from  Zq  and  7  using: 


(5.10) 

(5.11) 

(5.12) 

(5.13) 

(5.14) 

(5.15) 


/?  =  9t(Zo7) 


(5.16) 


L  3(Zor) 

ft) 

G  =  9i(y/Z0) 


(5.17) 

(5.18) 


C 


3(7/Zo) 


ft) 


(5.19) 


As  an  example,  the  RLCG  parameters  at  10GHz  are  R  =  3.7495  k£2/m,  L  =  297.75  nH/m,  G 
=  8.5257  mS/m  and  C  =  142.88  pF/m.  This  model  uses  an  infinite  ground  and  will  be  used  for 
analysis  purpose  only.  For  a  more  realistic  model,  a  finite  ground  thickness  must  be  introduced 
and  more  sophisticated  structures,  such  as  a  slotted  ground,  should  be  included. 


5.4  Results  and  Discussions 

5.4.1  Standing- wave  oscillator  description 

A  standing-wave  oscillator  is  basically  a  A/2  differential  transmission  line  shorted  at  both  ends. 
If  we  consider  an  ideal  transmission  line,  by  injecting  current  at  the  center  of  this  line,  a  wave 
propagates  to  the  edge,  reflects  onto  the  shorted  node  and  comes  back  to  where  it  comes  from, 
creating  a  standing  wave  on  the  line.  The  particularity  of  this  standing  wave  is  that  its  phase  is  the 
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Figure  5.6:  Standing-wave  oscillator  description. 


same  at  any  point  of  the  line.  Nevertheless,  the  amplitude  of  the  clock  is  maximum  at  the  center 
and  minimum  at  the  edges. 

Transmission  lines  implemented  on  CMOS  back-ends  are  lossy.  This  has  to  be  taken  into 
consideration  by  distributing  compensation  cells  along  the  lines.  These  compensation  cells  should 
present  a  negative  conductance,  such  as  cross-coupled  inverters.  Lets  describe  the  behavior  of 
the  voltage  along  the  line  to  quantify  the  required  compensation.  A  line  model  with  distributed 
coupled  inverters  is  presented  on  Fig.  5.6.  The  voltage  on  the  standing-wave  oscillator  is  V ( z )  = 
Vo(eyz  +  re~7z),  with  y,  the  propagation  constant  and  T,  the  reflection  coefficient.  As  edges  are 
shortened,  T  =  —  1.  That  leads  to: 

V(z)  =  2Vbsinh(yz)  (5.20) 

If  a  lossless  transmission  line  is  considered,  then  y  =  y/3  and: 

V  (z)  =  — 7-2Vo  sin(/3z)  (5.21) 

.  That  means  that  the  phase  is  constant  and  that  the  amplitude  is  rising  from  0  at  the  edge  to  2Vo  at 
the  center  of  the  transmission  line  with  a  sine  shape. 

For  a  lossy  transmission  line,  an  attenuation  constant  is  added  to  the  propagation  constant  and 
will  degrade  the  phase  of  the  standing  wave.  Thus,  y  =  a  +  y/3  and  can  also  be  expressed  as: 

y  =  y/(R  +  jLd))(G  +  jCa))  (5 .22) 


when  considering  the  RLCG  distributed  parameters  of  the  line.  A  low-loss  approximation  can  be 
used  for  linking  the  RLCG  parameters  to  a  and  /3 .  This  approximation  assumes  that 


G 

Cco 


<  1 


(5.23) 
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(5.24) 


£«! 

Leo 

and  is  valid  for  high  frequencies  (over  1GHz).  This  leads  to: 

1  ,R 

a  —  » (—  +  GZo) 

2  Z  o 

j3  =  (oVLC 


(5.25) 


(5.26) 


,  where  Zo  ~  y 

By  computing  y  with  a  real  part  into  the  voltage  equation,  Fig.  5.7  is  generated  and  describes 
the  skew  related  to  the  clock  period  versus  the  position  on  the  transmission  line.  We  observe  that 
using  a  higher  clock  frequency  leads  to  a  lower  skew  between  the  edges  and  the  center  of  the  lines. 
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Figure  5.7:  Skew  related  to  the  clock  period  versus  according  to  the  position  on  the  transmission 
line  and  for  various  oscillation  frequencies. 
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Half-circuit : 


Figure  5.8:  Compensation  cell  based  on  a  cross-coupled  NMOS  pair  (a)  and  associated  equivalent 
circuit  (b). 


5.4.2  Compensation  of  the  coupled  standing- wave  oscillator 


Compensation  cells  are  distributed  along  the  transmission  line  and  introduce  a  negative  conduc¬ 
tance  to  compensate  for  the  losses  of  the  line.  The  compensation  cells  will  be  designed  as  cross- 
coupled  NMOS  transistors  driven  by  a  current  source  (Fig.  5.8a).  The  current  will  be  supplied 
at  the  edges  of  the  line  by  connecting  the  nodes  at  Vdd.  As  the  total  resistance  of  the  line  will 
not  be  high,  the  DC  points  all  over  the  line  would  roughly  be  Vdd  with  a  few  mV  difference.  The 
compensation  cells  introduce  a  —Gm  conductance  and  an  additional  Cm  parallel  capacitance,  as 
shown  in  Fig.  5.8b.  The  Cm  will  be  equal  to  Cgs  +  4Cgd  and  the  total  conductance  to  —  Gm  +  Go, 
this  latter  value  being  neglected  in  front  of  Gm.  By  including  Gm  and  Cm,  the  components  of  the 
propagation  constant  become: 


C  +  Cm 
L 


+  (G  —  Gm) 


C  +  Cm 


) 


(5.27) 


/3  =  (O^fUC  +  Cm)  (5.28) 

The  compensation  would  be  effective  if  a  =  0  what  leads  to: 

Gm  =  ^  +  G  (5.29) 

A) 

Nevertheless,  Cm  will  reduce  the  effect  of  —  Gm  and  will  shorten  the  line  for  the  same  oscillation 
frequency.  In  a  real  system,  we  would  choose  Gm  to  be  equal  to  1.5  to  2  times  this  minimum  value 
for  starting  and  maintaining  the  oscillation  with  an  acceptable  signal  swing. 

A  methodology  has  been  defined  to  correctly  size  the  transistor  width  and  the  required  current 
to  meet  the  desired  specifications.  NMOS  transistors  in  a  65nm  process  have  been  characterized 
using  ft,  which  gives  the  ratio  between  Gm  and  Cm  of  the  cross-coupled  pair.  It  is  defined  here  as: 

*-2^55)  <5'30) 
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Figure  5.9:  f  versus  Gm/Id  for  65nm  NMOS  transistors. 


Fig.  5.9  presents  the  relation  between  ft  and  Gm/Id  for  a  minimum  length  and  1  jUm  wide  transistor. 
This  helps  us  relate  the  current  and  the  transistor  width  with  the  Gm  and  Cm  design  parameters. 
Thus,  ft  can  be  taken  as  the  main  parameter  to  analyze  the  compensation  cell. 

Using  Matlab  to  compute  the  transmission  line  and  compensation  cells  formulas,  the  required 
Gm  value  per  meter  is  plotted  versus  f  on  Fig.  5.10a  for  various  frequencies  of  operation  from 
2GHz  to  20GHz.  For  this  design,  we  consider  2  times  the  minimum  Gm  that  compensates  exactly 
for  the  losses.  The  current  consumption  per  unit  meter  is  also  plotted  on  Fig.  5.10b.  The  length  of 
a  X  line  is  computed  from  the  added  capacitance  and  is  plotted  on  Fig.  5.10d.  Finally,  the  power 
consumption  for  a  whole  wafer  can  be  derived  from  the  current  and  the  length  of  the  line.  The 
power  consumption  is  dependent  on  the  pattern  chosen  for  the  coupled  standing-wave  oscillators. 
For  this  computation,  a  simple  A/2  x  A/2  pattern  is  chosen,  although  the  pitch  would  be  smaller 
for  the  real  pattern.  The  power  consumption,  for  a  IV  Vdd,  is  then: 


_  4 Id  ,  nd 2  _  Idx  nd2 

~  ~Y  x  ~T~  ~  A 


(5.31) 


,  where  d  is  the  diameter  of  the  wafer  and  Id  is  the  current  in  A/m.  Fig.  5.10c  depicts  the  power 
consumption  versus  ft  for  various  oscillation  frequencies.  From  these  curves,  the  whole  system  is 
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Figure  5.10:  2Gmjnjn  (a),  current  consumption  (b),  power  consumption  (c)  and  length  of  line  (d) 
versus  ft  for  various  oscillation  frequencies. 

characterized  by  choosing  an  j\  value.  For  choosing  the  design  point,  the  swing  of  the  oscillation 
must  be  considered.  The  amplitude  is  roughly  equal  to  the  tail  total  current  Itau  multiplied  by  the 
equivalent  parallel  resistance  of  the  tank  Req.  As  an  example,  for  this  particular  configuration,  a 
2mA  current  by  compensation  cell  is  translated  to  a  swing  of  about  750  to  800mVpp.  Then,  this 
value  can  be  plotted  on  the  graph  from  Fig.  5.10b  to  derive  the  ft  value  for  the  chosen  oscillation 
frequency.  Fig.  5.11  describes  this  choice  for  a  10GHz  oscillation  frequency.  Table  5.1  sums  up 
the  designed  values  for  this  particular  operation  point  with  12/rm  wide  transmission  lines  spaced 
by  4/im  and  oscillating  at  10GHz.  Note  that  the  wafer  power  is  also  computed  for  a  A/8  x  A/8 
pattern. 

When  a  real  current  source  is  simulated  instead  of  the  ideal  current  source,  as  described  on 
Fig.  5.12,  the  skew  between  the  center  and  other  positions  on  the  line  is  degraded.  This  is  mainly 
due  to  the  fact  that  the  current  drawn  by  the  sources  is  dependent  of  the  voltage  swing,  which  is  in 
fact  dependent  of  the  position  on  the  line. 
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Figure  5.11:  Current  consumption  versus  ft  and  design  point  for  a  10GHz  standing- wave  oscilla¬ 
tor. 


5.4.3  Design  optimization 


To  optimize  the  design  in  terms  of  power,  we  have  stated  before  that  the  signal  swing  Amp  ~ 
It aii  x  Req.  That  means  that  to  reduce  the  consumed  current  for  the  same  signal  swing,  the  equiv¬ 
alent  parallel  impedance  must  be  increased.  Compensation  cells  have  a  very  small  effect  on  the 
impedance  seen  from  the  middle  of  the  line,  Z,-„ ,  which  is  a  strong  function  of  the  width  and  space 
of  the  transmission  lines.  RLCG  parameters  are  plotted  on  Fig.  5.13  versus  width  W  and  space  S 
of  the  lines.  Next,  Q  and  Z,„  are  computed  by  using: 


Q  = 


Zin 


0)1. 

~R 

L 

CR 


(5.32) 

(5.33) 


Only  the  inductive  Q  is  considered  here,  as  the  model  of  the  line  is  simple  and  does  not  include 
a  finite  ground  plane  for  the  low  metal  layer,  thus  leading  to  a  very  small  G  value.  Q  and  Z;w 
plots  are  depicted  on  Fig.  5.14.  From  these  plots,  the  design  point  chosen  to  minimize  the  power 
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Figure  5.12:  Compensation  cell  including  the  mirror-based  current  source. 

Table  5.1:  Design  parameters  for  a  1 2/im  wide  4/im  space  standing-wave  oscillator  at  10GHz 


Ideal  current  source 

Real  current  source 

A/2  line  length 

7.12mm 

Cross-coupled  transistors  width 

11.1/im 

Cross-coupled  pairs  current 

2mA 

Wafer  power  for  A/2  x  A/2  pattern 

39W 

Wafer  power  for  A/8  x  A/8  pattern 

3 10W 

Amplitude  at  the  center 

824mVpp 

862mVpp 

Skew  (between  center  and  A/8  point) 

0.9ps 

2.6ps 

Skew  (between  center  and  A/16  point) 

Ups 

4ps 

consumption  is  a  6[im  width  and  a  20/im  space.  The  quality  factor  of  the  transmission  line  should 
therefore  be  equal  to  4.  Note  that  another  trade-off  might  be  chosen  if  a  finite  ground  thickness  is 
modeled.  Moreover,  a  quick  study  has  shown  that  using  a  slotted  ground  would  be  beneficial  as  it 
increases  Z,„  while  keeping  the  Q  factor  almost  constant.  When  implemented,  the  transmission  line 
with  slotted  ground  should  be  carefully  modeled  to  take  into  account  all  effects  in  the  simulation. 
The  model  has  not  been  detailed  so  far,  because  the  main  purpose  of  this  study  was  to  draw  a 
picture  of  possible  achievements  and  improvements. 

Table  5.2  sums  up  the  designed  parameters  for  the  chosen  6/im  wide,  20/im  space  transmis¬ 
sion  lines.  We  can  observe  that  the  power  consumption  is  a  quarter  less  than  previous  results  for 
the  same  performance  of  the  standing-wave  oscillator.  Furthermore,  the  consumption  can  be  fur¬ 
ther  decreased,  at  the  expense  of  the  swing  amplitude,  by  decreasing  the  compensation  negative 
conductance  to  1.5 Gm^min.  This  saves  an  additional  third  of  the  power  and  decrease  the  skew,  but 
the  amplitude  of  the  oscillation  is  slightly  lower.  The  skew  decrease  is  easily  explained  by  the  fact 
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Figure  5.13:  R  (a),  L  (b),  G  (c)  and  C  (d)  parameters  of  the  transmission  line  versus  width  and 
space. 

that  compensation  cells  add  an  additional  Gm  that  compensates  the  losses,  but  as  the  Gm  intro¬ 
duced  is  higher  than  the  Gmj„in,  then  extra  “losses”  are  introduced  and  increase  the  skew.  In  fact, 
if  Gm  =  2  x  Gm.mjn,  then  the  skew  would  be  equal  to  the  case  in  which  there  is  no  compensation 
(although  in  the  present  case  the  oscillation  is  maintained  with  an  acceptable  level).  Decreasing 
Gm  to  1.5 Gm^min  appears  to  be  a  good  trade-off  between  power,  swing  and  skew. 

5.4.4  Tolerance  of  the  standing-wave  oscillator  to  PVT  variations 

To  evaluate  the  tolerance  of  the  standing-wave  oscillator  to  variations,  the  previous  model  has 
been  used  and  parameters  has  been  changed  one  at  a  time.  The  influence  on  oscillation  frequency, 
amplitude  swing  and  clock  skew  has  been  studied.  First,  the  physical  dimensions  of  the  transmis¬ 
sion  line  are  modified  (width,  length,  space,  thickness  of  metal  layer  and  height  of  the  dielectric). 
Next,  the  process,  voltage  and  temperature  are  modified.  Finally,  the  worst-case  in  all  deviations 
is  considered,  since  running  extensive  simulations  with  all  corners  is  too  time  consuming. 

The  influence  of  physical  parameters  on  frequency,  amplitude  and  skew  is  shown  in  Table  5.3. 
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Figure  5.14:  Q  (a)  and  Z;n  at  the  center  of  the  line  (b)  versus  width  and  space. 

Table  5.2:  Design  parameters  for  a  6/im  wide,  20/im  space  standing-wave  oscillator  at  10GHz 


2  x  Gmmin 

1.5  x  Gmmin 

A/2  line  length 

6.725mm 

6.875mm 

Cross-coupled  transistors  width 

7.4/im 

5.7jUm 

Cross-coupled  pairs  current 

1.34mA 

1mA 

Wafer  power  for  A/2  x  A/2  pattern 

29W 

21W 

Wafer  power  for  A/8  x  A/8  pattern 

232W 

170W 

Amplitude  at  the  center 

860mVpp 

650mVpp 

Skew  (between  center  and  A/8  point) 

3ps 

l.lps 

Skew  (between  center  and  A/16  point) 

4.7ps 

1.9ps 

The  length  of  the  line  has  the  strongest  influence  on  the  frequency  as  a  1%  change  in  length 
translates  in  a  1%  change  in  frequency.  The  amplitude  is  affected  by  all  deviations  and  the  skew 
is  mostly  influenced  by  the  height  of  the  dielectric  with  about  a  O.lps  variation  per  percentage 
of  height  variation.  As  accuracy  of  process  is  absolute,  the  relative  variations  are  obtained  by 
taking  into  account  a  40nm  imprecision  on  the  metal  layer  drawing,  a  0.15/im  imprecision  on 
the  metal  layer  thickness  and  a  0.3/im  imprecision  on  dielectric  height,  which  are  typical  values 
for  nanoscale  CMOS  processes.  This  leads  to  the  relative  variations  of  Table  5.3.  It  is  notable 
that  vertical  variations  have  a  higher  relative  influence  than  horizontal  variations.  Finally,  absolute 
variations  are  mainly  driven  by  the  metal  layer  thickness  and  dielectric  height.  Taking  all  these 
variations  into  account  leads  to  a  frequency  variation  of  roughly  70MHz,  an  amplitude  variation  of 
200mV  and  a  skew  deviation  of  1.5ps.  Frequency  and  skew  deviations  seem  low,  while  amplitude 
is  largely  affected  by  physical  parameters. 
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Table  5.3:  Variations  of  frequency,  amplitude  and  clock  skew  related  to  physical  parameters  of  the 
transmission  lines 


Width 

Length 

Space 

Thickness 

Height 

AFrequency  (MHz/%) 

7.4 

-100 

2.9 

-2.6 

-1.5 

AAmplitude  (mv/%) 

-3 

-6.75 

2.15 

6.2 

8.95 

ASkew  (fs/%) 

-15-20 

-30-40 

15-20 

35-50 

70-100 

Relative  variations 

0.3% 

0.0005% 

1% 

15% 

12% 

AFrequency  (MHz) 

2.22 

-0.05 

2.9 

-39 

-18 

AAmplitude  (mV) 

-1 

-0.003 

2.15 

93 

107 

Askew  (fs) 

-6 

-0.02 

20 

750 

1200 

Process  variations  on  transistors  of  the  cross-coupled  pairs,  voltage  and  temperature  are  ana¬ 
lyzed  the  same  way.  The  effects  of  these  variations  are  small  on  all  design  parameters.  Table  5.4 
sums  up  these  variations.  The  process  has  been  varied  from  SS  to  FF.  The  temperature  has  been 
swept  from  27°C  to  80°C  and  the  voltage  by  ±200mV. 

We  can  conclude  that  metal  thickness  and  dielectric  thickness  variations  dominate  the  global 
variations  on  frequency,  amplitude  and  clock  skew  and  that  the  standing-wave  oscillator  archi¬ 
tecture  is  quite  tolerant  to  all  types  of  variations  encountered  on  a  whole  wafer.  Only  100MHz 
frequency  offsets  could  be  observed  for  10GHz  oscillators  on  different  reticles,  what  seems  to  be 
good  enough  to  synchronize  these  oscillators  at  a  large  scale.  The  clock  skew  variation  would  be 
less  than  2ps,  considering  the  center  of  each  oscillator,  what  is  definitely  a  good  value  for  synchro¬ 
nizing  several  transceivers.  However,  the  amplitude  variation  is  quite  large,  around  300mV  for  a 
IV  supply,  and  would  contribute  to  further  mismatch  between  clusters.  In  order  to  counteract  this 
effect,  a  feedback  system  able  to  track  the  voltage  swing  and  modify  the  bias  current  to  correct  the 
variation  must  be  investigated. 

Table  5.4:  Variations  of  frequency,  amplitude  and  clock  skew  related  to  physical  and  PVT  varia¬ 
tions. 


Physical  param. 

Process  (SS  or  FF) 

Temperature 

Vdd 

AFrequency 

70MHz 

10MHz 

10MHz 

4MHz 

AAmplitude 

200mV 

lOmV 

30mV 

60mV 

ASkew 

1.5ps 

0.3ps 

0.2ps 

O.lps 
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5.4.5  Clock  buffer  design  and  evaluation 

As  clocks  are  used  both  for  90GHz  PLL  reference  and  for  the  baseband  circuitry,  a  buffer  is  useful 
in  the  latter  case  to  square  the  standing- wave  signal.  If  the  clock  is  picked  at  different  positions  on 
the  line  (i.e.  center,  25%,  50%  and  75%),  the  amplitude  of  the  input  signal  will  be  different.  Then 
a  buffer  having  an  input  amplitude-independent  propagation  delay  is  needed.  Moreover,  the  buffer 
must  be  tolerant  to  PVT  variations. 

For  addressing  the  input  amplitude-independent  buffer,  fixed  dummy  inverters  are  used  to  load 
the  inverter  chains  to  account  for  the  sinusoidal  amplitude  distribution  along  the  line.  The  dummy 
slows  down  the  first  inverter  and  equalizes  the  delay  for  the  specified  input  amplitude.  The  dummy 
sizes  are  0.94,  0.84  and  0.53,  for  the  center,  25%  and  50%  of  the  line,  respectively,  as  described 
on  Fig.  5.15.  The  buffer  chains  are  AC  coupled  to  the  line  and  the  input  load  and  output  drive  are 
equivalent  for  different  buffers.  The  bias  point  is  fixed  to  0.5V. 


Center  of 
the  line 


Amplitude 

decreases 


75%  of 
the  line 


(j^O- 1 — [^^0 — ppO — ppO — 


4gjo 

dummy 


AC  coupling  to  the  line 


Dummies  for  25%  and 
50%  are  .84  and  .53, 
respectively 


Figure  5.15:  Clock  buffer  including  dummy  inverters  for  propagation  delay  equalization. 

The  propagation  delay  of  the  center  buffer  is  plotted  in  Fig.  5.16a  for  process  and  temperature 
variations.  It  can  be  observed  that  the  propagation  delay  may  vary  from  23  to  38ps,  introducing 
a  15ps  difference.  Fig.  5.16b  presents  the  skew  for  different  buffers  on  the  same  line.  The  skew 
difference  may  be  up  to  2ps.  The  voltage  influence  on  the  propagation  delay  and  skew  is  plotted  on 
Fig.  5.17.  The  influence  is  on  the  same  order  than  process  and  temperature.  All  together,  the  center 
buffer  simulated  at  SS  80C  0.8V  presents  a  23ps  difference  compared  to  TT  27C  IV.  The  skew 
increases  by  4ps.  Thus,  the  global  skew  due  to  PVT  variations  over  the  wafer  for  a  simple  buffer  is 
equal  to  27ps,  which  is  more  than  a  quarter  of  the  clock  period  when  operating  at  10GHz.  As  these 
variations  are  rather  large,  another  architecture  is  needed  to  compensate  for  this.  For  example,  a 
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Propagation  delay  (ps) 


Worst  skew  (ps)  for  different 
amplitudes  and  different  dummies 


Figure  5.16:  Buffer  propagation  delay  (a)  and  worst  skew  (b)  versus  process  and  temperature 
comers. 


DLL  approach,  as  shown  in  Fig.  5.18,  can  help  reduce  the  global  skew  difference.  Moreover,  an 
optimized  pattern  in  which  the  clock  is  only  picked  at  the  same  position  on  every  line  would  be 
preferable  for  a  global  skew  reduction. 


(a)  (b) 


Figure  5.17:  Buffer  propagation  delay  (a)  and  worst  skew  (b)  versus  supply  voltage. 


5.4.6  Locking  range  of  coupled  standing-wave  oscillators 

Coupling  two  standing-wave  oscillators  can  be  achieved  in  many  different  ways,  depending  on 
the  pattern  chosen  to  cover  the  area.  A  simple  coupling  of  4  oscillators  is  shown  in  Fig.  5.19.  All 
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Figure  5.18:  Proposed  DLL-based  approach  for  compensating  the  PVT  variations  on  the  clock 
buffers. 

standing- wave  oscillators  are  coupled  at  the  exact  same  position  to  ensure  a  locking  at  the  specified 
frequency.  The  coupling  is  very  strong  as  it  is  accomplished  by  vias  between  two  metal  layers, 
providing  only  few  ohms  of  interconnect  resistance. 

A  quick  analysis  of  the  coupling  behavior  is  provided  here  to  show  how  strong  is  the  coupling 
mechanism.  The  locking  range  will  be  analyzed.  It  roughly  defines  the  maximum  frequency 
deviation  allowed  for  two  oscillators  to  lock  to  each  other.  From  Adlers  equation,  we  can  derive: 

(534) 

with  Ajnj  and  A,  the  relative  amplitude  of  the  injecting  signal  and  the  oscillator  signal,  both  at  the 
center  of  the  line,  Q,  the  quality  factor  of  the  line  and  /o,  the  oscillation  frequency.  As  an  example, 
with  a  Q  of  4  and  a  coupling  at  the  center  of  the  line  (Ajnj  =  A),  then  the  locking  range  is  equal 
to  1250MHz  for  a  10GHz  oscillation  frequency.  With  the  architecture  previously  described,  the 
total  single-ended  injected  current  is  equal  to  4.7mA.  This  theoretical  results  has  been  validated  in 
simulations,  which  show  a  locking  for  frequency  differences  of  less  than  1250MHz  between  two 
oscillators. 

The  locking  range  can  be  extended  to  other  positions  on  the  line: 

A®/ocfe  =  sin2(2^|-)  (5.35) 

with  z  being  the  position  on  the  line  and  referenced  at  0  at  the  edge.  In  fact  the  ratio  between 
the  injected  signal  and  the  oscillator  signal  at  the  center  has  a  sin2  behavior  due  to  the  sinusoidal 
distribution  of  the  amplitude  and  the  coupling  strength.  That  leads  to  a  locking  range  that  is 
dependent  of  the  position  as  stated  in  Table  5.5.  As  expected,  the  coupling  is  stronger  in  the 
middle  of  the  line  and  very  weak  at  the  edges.  When  compared  to  the  frequency  deviations  due 
to  PVT  variations  on  a  wafer,  we  are  confident  in  being  able  to  synchronize  many  standing-wave 
oscillators  on  the  same  wafer,  although  extensive  simulations  at  a  larger  scale  are  needed  to  confirm 
this  point. 
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Figure  5.19:  Example  of  coupling  between  4  standing-wave  oscillators. 


5.4.7  Antenna  co-design 

For  a  good  integration  of  the  transceivers  with  the  distribution  and  synchronization  scheme,  the 
clock  distribution  pattern  must  be  co-designed  with  the  antenna  pattern.  A  typical  slot  antenna  is 
represented  in  Fig.  5.20  and  will  create  a  large  distributed  array.  The  pitch  of  the  array  would  be 
A  antenna/2  and  A  antenna  is  defined  by: 


A, 


' antenna 


90  GHz 


(5.36) 


On  the  other  hand,  the  length  of  the  standing- wave  transmission  line  will  be  Aline/2  with  Aline 
being  roughly  equal  to: 

^ line  =  i  nr-u  77  ~ffr  (5.37) 

10  GHz  x  A/er 

This  leads  to  a  ratio  between  Aline  and  Aantenna  of  4.5  to  5  for  typical  values  of  £4.  This  is  true 
for  a  non-loaded  line  and  as  compensation  cells  are  distributed  on  the  line,  then  the  line  length 
would  be  shorter  and  the  ratio  would  be  roughly  equal  to  4.  Thus,  the  easiest  way  to  fit  the  clock 
distribution  pattern  with  this  antenna  pattern  is  to  follow  the  perimeter  of  the  antenna. 
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Table  5.5:  Locking  range  according  to  position  on  the  line 


Position 

Center  (z=A/4) 

z=3A/16 

OO 

II 

N 

z=A/16 

Locking  range 

1.25GHz 

1.06GHz 

625MHz 

180MHz 

1 .66mm 

< - > 


Figure  5.20:  90GHz  slot  antenna  layout  pattern. 


Fig.  5.21  shows  a  proposed  implementation  of  the  clock  distribution  pattern  in  which  the  trans¬ 
mission  lines  follow  the  perimeter  of  each  antenna.  In  this  particular  implementation,  it  is  notable 
that  the  clock  is  taken  at  the  exact  same  position  (A/4  away  from  the  edges)  on  all  the  lines  and 
this  helps  reduce  the  skew  between  clusters.  The  particularity  of  this  scheme  is  that  the  coupling 
with  the  neighbors  are  either  strong  when  coupled  near  the  center  or  weak  when  coupled  near  the 
edge.  This  results  in  two  columns  of  clusters  being  strongly  coupled  together,  but  weakly  coupled 
to  the  adjacent  two  columns.  Once  again,  this  must  be  modeled  and  simulated  to  determine  if  this 
is  an  issue  for  correctly  locking  all  the  oscilators  together. 

Moreover,  the  influence  of  the  clock  distribution  on  the  antenna  radiation  pattern  is  hard  to 
predict  and  advanced  electromagnetic  simulations  are  required  to  evaluate  the  interactions  between 
them.  At  first  sight,  the  transmission  lines  would  act  as  a  shield  ring  between  two  antenna  clusters 
and  would  be  beneficial  for  the  antenna  radiation  pattern.  But  the  fact  that  current  flows  along 
these  lines  could  definitely  affects  the  radiation  pattern.  This  must  be  investigated  to  ensure  a 
good  co-integration. 
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Figure  5.21:  Proposed  clock  distribution  pattern  co-designed  wit  the  antenna  array. 


The  full  simulation  of  coupled  standing-wave  oscillators  using  RLCG  networks  in  Cadence  is 
suitable  for  studying  the  coupling  between  up  to  4  or  8  oscillators.  If  we  would  like  to  go  further 
and  study  the  locking  behavior  for  a  large-scale,  other  tools  or  methods  may  be  used.  This  has 
been  discussed  with  Prof.  Jaijeet  Roychowdhury’s  group  at  UC  Berkeley  and  two  approaches 
appear  suitable.  The  first  one  is  to  use  a  reduction  of  the  RLCG  networks.  That  would  help  reduce 
the  number  of  nodes  and  fasten  the  simulation.  Thus,  we  would  be  able  to  simulate  more  elements 
using  the  same  testbench  in  Cadence.  But  this  does  not  solve  the  problem  of  very  large-scale 
systems.  The  second  and  more  robust  solution  is  to  define  a  macromodel  from  the  behavior  and 
characteristics  of  a  single  standing-wave  oscillator  and  to  model  the  coupling  between  two  of 
them.  We  would  then  be  able  to  use  a  higher-level  simulator  to  understand  the  locking  issues  of 
a  particular  pattern  and  go  deeper  into  the  co-integration  of  the  antenna  clusters  with  the  clock 
distribution  scheme.  We  plan  to  collaborate  with  Prof.  Jaijeet  Roychowdhury’s  group  to  set  up  an 
appropriate  behavioral  model  for  the  large-scale  simulation  of  coupled  standing-wave  oscillators. 


5.5  Conclusions 

For  synchronization  of  transceivers  on  a  whole  wafer,  a  clock  distribution  architecture  based  on 
coupled  standing-wave  oscillators  has  been  demonstrated  to  be  suitable  for  achieving  both  low 
clock  skew  between  clusters  and  low  power  consumption.  Based  on  a  simple  transmission  line 
model,  simulations  of  standing-wave  oscillators  have  been  performed  to  evaluate  their  perfor¬ 
mance  and  a  design  methodology  has  been  defined  for  sizing  the  main  parameters,  according  to 
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the  desired  performance. 

Study  of  the  influence  of  PVT  variations  on  standing-wave  oscillators  has  shown  a  good  tol¬ 
erance  to  variations.  The  main  influence  comes  from  the  metal  thickness  and  dielectric  height 
variations.  Worst-case  deviations  are  low  for  frequency  and  clock  skew  (respectively  100MHz 
frequency  deviation  for  a  10GHz  clock  and  2ps  clock  skew  from  one  transmission  line  to  another). 
Signal  swing  varies  more  with  PVT  variations  and  means  to  control  the  current  drawn  by  the  com¬ 
pensation  cells  are  required.  The  clock  buffer  study  has  shown  that  this  block  is  very  sensitive  to 
any  type  of  variations  and  a  DLL-based  buffer  needs  to  be  designed. 

Coupling  study  has  demonstrated  that  the  coupling  strength  is  very  strong  when  oscillators  are 
coupled  at  the  center  and  weaker  when  coupled  near  the  edges.  Nevertheless,  the  locking  range 
seems  to  be  sufficient  enough  to  obtain  a  good  locking  behavior  of  many  standing- wave  oscillators. 
Co-design  with  the  antenna  array  is  mandatory  to  achieve  a  good  integration  of  the  global  structure 
and  a  possible  pattern  has  been  proposed.  The  simulation  results  obtained  in  this  study  as  well  as 
the  co-integration  with  the  antenna  need  to  be  confirmed  with  a  real  chip  design  at  a  small-scale,  in 
which  realistic  models  for  the  lines  would  be  considered  and  radiation  pattern  perturbations  would 
be  addressed. 

Furthermore,  this  work  has  enabled  collaborations  on  large-scale  simulations  of  coupled  standing- 
wave  oscillators  with  Prof.  Jaijeet  Roychowdhury’s  group.  Numerical  simulation  tools  would  help 
understand  the  coupling  and  locking  behavior  of  this  type  of  distributed  oscillators  on  a  large  scale 
and  would  allow  the  design  of  an  optimized  clock  distribution  architecture  for  the  distributed  radio 
on  a  whole  wafer. 
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